• We should now be fully online following an overnight outage. Apologies for any inconvenience, we do not expect there to be any further issues.

Discussion Intel current and future Lakes & Rapids thread

Page 246 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

jpiniero

Lifer
Oct 1, 2010
16,841
7,285
136
Looks like Intel may be focusing on throughput over latency with the server parts. Wonder if L3 is a victim cache like SKL-SP.

Yeah I would expect them to not have changed the cache hierarchy. 1.25 MB L2/core and 1.5 MB L3/core is definitely strange.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
It is a victim cache for sure. Inclusive L3 makes no sense with those sizes. Overall rather anemic increase in cache sizes over Skylake on 14nm. 128KB more of L3 and 256KB more of L2 per core? I guess Intel in their quest to increase core counts and include AVX512 units have put caches on diet.
All while AMD core enjoy massive 4MB of L3 per core out of 16MB total, soon to be improved to 32MB total.

Does not bode well for Intel. I have expected more from 10nm in 2020.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
It is a victim cache for sure. Inclusive L3 makes no sense with those sizes. Overall rather anemic increase in cache sizes over Skylake on 14nm. 128KB more of L3 and 256KB more of L2 per core? I guess Intel in their quest to increase core counts and include AVX512 units have put caches on diet.
All while AMD core enjoy massive 4MB of L3 per core out of 16MB total, soon to be improved to 32MB total.

Does not bode well for Intel. I have expected more from 10nm in 2020.
If you are going to mention size *laughs like a teenage boy for a second*, you need to also discuss latency. With the 16MB L3 currently, there is 40ns of latency. For the 32MB upcoming, it is 47ns latency. So it may be bigger, but it is also slower.

Now, AMD can take it because of the chiplet design and ram latency with their design. But Intel has much lower memory latency (faster) than AMD with the monolithic die and the integrated memory controller right there.

Because of this, if Intel made caches too large, the cache seek times go up which would slow their time to just go to ram. It's a balancing act.

Also, intel's L2$ is way faster than their L3 or AMD's L3$. So having that be larger actually does show a speed up over a smaller L2.

Once Intel starts stacking, this might change, but instead of using MCM, they seem like they want to use an active interposer with cache and I/O on that chip shared between their big and little cores (Lakefield IIRC). This might increase seek marginally, but less than going to the I/O die for AMD. The TSV is faster. So, what the proper amount of L3 is may still be lower than AMD. This is especially true if 1GB L4 or ddr or sram or something is stuck on the chip before the longer trip and there is a proper memory prefetch for that.

But that's kind of getting into the weeds there.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,787
136
Overall rather anemic increase in cache sizes over Skylake on 14nm. 128KB more of L3 and 256KB more of L2 per core?

It's not a big deal for server especially if the future is having on package HBM-class memory.

More often than in client, the data spills out of the caches and access memory and/or the dataset is larger. HBM is all about bandwidth not latency but the enormous increase will still benefit a lot of workloads.

Of course its also a balancing act as @uzzi38 points out. Something like 48 or 56 Willow Cove based cores on 10nm would be close to the reticle limit with Tigerlake-like caches. Nevermind rumors from MLiD saying it might go up to 72 all on a single die!

This is especially true if 1GB L4 or ddr or sram or something is stuck on the chip before the longer trip and there is a proper memory prefetch for that.

Back when Xeon Phi was alive and well, there were patents going around with a chip very similar to future Phi with very large L4 caches not using SRAM. They've been doing research into alternative cache memories. They actually demonstrated production-ready STT-MRAM and ReRAM on their 22FFL(read, the process used for active interposer Foveros, and a very low leakage process).

We'll see what they really use.
 
Last edited:
  • Like
Reactions: uzzi38 and ajc9988

itsmydamnation

Diamond Member
Feb 6, 2011
3,079
3,915
136
If you are going to mention size *laughs like a teenage boy for a second*, you need to also discuss latency. With the 16MB L3 currently, there is 40ns of latency. For the 32MB upcoming, it is 47ns latency. So it may be bigger, but it is also slower.

Now, AMD can take it because of the chiplet design and ram latency with their design. But Intel has much lower memory latency (faster) than AMD with the monolithic die and the integrated memory controller right there.
In big multicore systems its about bandwidth amplification more then latency. Intel is somewhere around 1/2 the cores with 8? memory channels for ice lank, they can afford more memory bandwidth per core.
 

DrMrLordX

Lifer
Apr 27, 2000
22,945
13,031
136
Was PCIe 4.0 on Z490 always planned?

I don't recall PCIe 4.0 ever being in the mix for Comet. But when Comet Lake-S was announced as being a Q4 2019 product, Rocket Lake was still a mysterious force in the distant future, and frankly Intel hadn't provided us with much information about connectivity on any of their future platforms.

Anyway Rocket Lake brings some much needed platform features - dedicated x4 PCIe for NVMe and x8 DMI link to chipset, but I would hope that those aren't too difficult to get right once PCIe 4.0 support has been laid out.

Those are the kind of features that Intel is supposed to do better than anyone else.

However, until I see Intel becoming a lot more talkative about their future plans I will always expect delays for high volumes, no matter the node.

Intel's 2020 and 2021 server picture may have a big effect on how Rocket plays out (especially launch delays and volume).

Right now, Intel is still limping along on Cascade Lake in the server room. Cooper is extremely limited and Ice Lake-SP is MIA. Nearly anything Intel sells is Cascade Lake. And they are still selling it - for now. It's really questionable as to how long anyone will continue buying Cascade Lake for any reason, but hey give Intel credit for leveraging all their sales talent to keep the gravy train running.

Cascade Lake is eating a lot of 14nm capacity.

IF Ice Lake-SP FINALLY launches in some non-zero volume and then Intel miraculously hits with Sapphire Rapids in 2021 at a good clip, depending on how Intel massages the market (namely: can they convince server hardware customers of various stripes to hold off on Milan long enough to give Sapphire Rapids a chance in Q3 2021 and ignore Genoa coming uh whenever?), their demand for 14nm wafers may crater just from buyers holding off for Sapphire Rapids. Which means that lingering Comet Lake products and newly-launched Rocket Lake products will have many more wafers to play with throughout all of 2021.

And that all assumes Intel will have enough 10nm wafers available to serve their customers with Sapphire Rapids, which is another unknown. But I digress.

One of the problems with the Comet Lake-S launch is that several SKUs are plagued by shortages, and Cascade Lake is at least partially-responsible for that. Take 14nm server CPUs out of the picture and that problem vanishes for Rocket. At that point it's a matter of whether Intel has pulled together the design quickly enough to successfully launch in Q1 2021. As I stated above, Intel has historically been really good on platform (crap like the i820 notwithstanding). I personally do not see PCIe 4.0 and x8 DMI being a primary driver in Rocket Lake-S delays. I see the backport effort itself as being the likely culprit (if there is any). Can Intel hit the clockspeed and power targets they want using a modified Sunny/Willow Cove core backported to 14nm? We already know they're having to retreat on core count. Or at least we reasonably suspect, and all early indicators point in that direction. And even if Rocket does perform basically as planned (5.0 GHz turbo limits, +2x% IPC gains), can Intel pull together enough wafers to sell a meaningful number to anyone? Or are they just gonna have to keep shlepping Cascade Lake at deeper and deeper discounts due to delayed Sapphire Rapids and delayed/failed Ice Lake-SP? If Intel's 10nm server products don't show up in sufficient volume to take pressure off the 14nm fabs, the only way Intel will be able to sell lots of Rocket dice is if customers abandon Intel's server lineup entirely (Intel will fight like hell to prevent that, even if it means pushing more Cascade Lake throughout 2021).

Delay was I believe because they delayed production to open up room for more Cascade Lake XCC.

See above, but if Rocket also has to compete for wafers with Cascade Lake then it could be the same rodeo.
 
  • Like
Reactions: spursindonesia

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
If you are going to mention size *laughs like a teenage boy for a second*, you need to also discuss latency. With the 16MB L3 currently, there is 40ns of latency. For the 32MB upcoming, it is 47ns latency. So it may be bigger, but it is also slower.

I really hope You don't believe AMD ( or Intel or anyone else ) has L3 cache with 40ns of latency. Cause it is ~11ns versus ~17ns for Intel and it is reasonable to expect that latency will increase some 3-4ns for 32MB of unified for 8C chips with unknown "architecture". Might still be much faster than Intel's server chip L3. Depends on ICL-SP uncore clocks, hops count and optimizations Intel put into chip.

Now, AMD can take it because of the chiplet design and ram latency with their design. But Intel has much lower memory latency (faster) than AMD with the monolithic die and the integrated memory controller right there.

On SKL-SP memory latency is not exactly great either, even when running 3.2Ghz uncore, 2.4Ghz is anemic in latency, bandwidth and cache size. Server chips run 2.4Ghz and 6 channel controllers split into 2 clusters, so average latency is not exactly stellar. ~90ns is what Anandtech found for Intel and 110ns for AMD's Rome.

Because of this, if Intel made caches too large, the cache seek times go up which would slow their time to just go to ram. It's a balancing act.

What is the point of L3 cache? To let a workload with working set larger than L2 cache size spill over to cache instead of going all the way to the memory that has 90ns latency and has limited BW. Except on Intel L3 is just wrong:
1) L3 size is small in size, cores compete for very limited resource
2) Even worse, L3 bandwidth is anemic and "comparable" to memory bandwidth. That is once you are above L2, you might as well go to memory, cause several cores will eat the BW.
3) L3 is competing for chip wide resource -> mesh bandwidth. So core that is using L3 might hurt other cores if some bottleneck on mesh arises.


Also, intel's L2$ is way faster than their L3 or AMD's L3$. So having that be larger actually does show a speed up over a smaller L2.

News at eleven i guess. Intel's L2 is faster than L3?
 
Feb 17, 2020
108
289
136
Well look at this: https://www.intc.com/files/doc_financials/2020/q2/Q2-2020_Earnings-Release.pdf

"The company's 7nm-based CPU product timing is shifting approximately six months relative to prior expectations.
The primary driver is the yield of Intel's 7nm process, which based on recent data, is now trending approximately
twelve months behind the company's internal target."

Straight from the horse's mouth. Now what was a certain idiot saying about how good Intel's 7nm yields are?
 

mikk

Diamond Member
May 15, 2012
4,308
2,395
136

uzzi38

Platinum Member
Oct 16, 2019
2,746
6,655
146
I am really hoping this doesn't lead to all 7nm projects getting pushed back 6 months, Intel really can't afford to let this keep on happening at all.

This kind of crappy execution is going to seriously hurt the entire x86 market.

No joke if I were AMD I'd actually start thinking about resusitating K12.
 
  • Like
Reactions: Lodix

mikk

Diamond Member
May 15, 2012
4,308
2,395
136
Ponte Veccio to feature external process tech.

Those words speak for themselves. Unfortunately.

Did they say this? This is huge if true, basically confirmes Intel will switch to TSMC for Gen12 HP products. It sounds strange but this is a good news after all.
 

ajc9988

Senior member
Apr 1, 2015
278
171
116
I really hope You don't believe AMD ( or Intel or anyone else ) has L3 cache with 40ns of latency. Cause it is ~11ns versus ~17ns for Intel and it is reasonable to expect that latency will increase some 3-4ns for 32MB of unified for 8C chips with unknown "architecture". Might still be much faster than Intel's server chip L3. Depends on ICL-SP uncore clocks, hops count and optimizations Intel put into chip.



On SKL-SP memory latency is not exactly great either, even when running 3.2Ghz uncore, 2.4Ghz is anemic in latency, bandwidth and cache size. Server chips run 2.4Ghz and 6 channel controllers split into 2 clusters, so average latency is not exactly stellar. ~90ns is what Anandtech found for Intel and 110ns for AMD's Rome.



What is the point of L3 cache? To let a workload with working set larger than L2 cache size spill over to cache instead of going all the way to the memory that has 90ns latency and has limited BW. Except on Intel L3 is just wrong:
1) L3 size is small in size, cores compete for very limited resource
2) Even worse, L3 bandwidth is anemic and "comparable" to memory bandwidth. That is once you are above L2, you might as well go to memory, cause several cores will eat the BW.
3) L3 is competing for chip wide resource -> mesh bandwidth. So core that is using L3 might hurt other cores if some bottleneck on mesh arises.




News at eleven i guess. Intel's L2 is faster than L3?

As pointed out by another poster, which I thank, I meant cycles, not nanoseconds. Chalk it up to a brain fart. Thank you for catching it and pointing it out. Sincerely.

Now, generally you are right on cache, wanting L3 to be larger to catch as a victim cache before going off chip. L2 is significantly faster though, multiples. As such, having as large an L2$ as possible before the latency hits is beneficial. I'm doubting intel's engineers would do it willy nilly.

But, you have a good point that no matter what, L3 is faster than off chip, which is much slower. You are also very correct to point to the speed of the uncore effecting it. And you are also correct that for many commercial/ data center/HPC uses the latency matters much less.

But, looking at overclocked data on hwbot with a 3175 Xeon W at 4GHz core with memory at 4GHz, the bandwidth was 160GBps, whereas the L3 was about double that. Latency was lower, but that is the uncore. But did want to mention that L3 and memory is not equal speeds at the moment, save for latency. Now, this may change with DDR5 octo- channel or HBM incorporation onto a die. But that is a different discussion.

As to intel's mesh topology, that, too, is another discussion worth having.

Edit: to be clear, I do not have a modern Intel server chip to compare stock values in a common program to check these values at the moment, so at stock the memory and bandwidth on L3 could match between the two on other chips in other circumstances.
 
Last edited:

ajc9988

Senior member
Apr 1, 2015
278
171
116
40 and 47 are not ns, these are cycles numbers. Then you have to factor in the clock speed. I.e. a Zen 2 at 4 GHz would have exactly 10ns of L3 cache access latency.
Yeah, brain fart. My 1950X gets 9.9ns latency on L3. Misspoke.
 

DrMrLordX

Lifer
Apr 27, 2000
22,945
13,031
136
Intel confirmed ADL is based on 10nm, both desktop and mobile: https://s21.q4cdn.com/600692695/files/doc_financials/2020/q2/Q2-2020_Earnings-Presentation.pdf

No surprise but important to note because the usual suspects claimed it's 14nm.

Did anyone seriously entertain the idea that a product as late as Alder Lake-P or -S would be 14nm? I mean I'm pretty skeptical about Tiger Lake-H (still waiting) but come on, Alder Lake has always been a 10nm product, and Tiger Lake-U would appear to demonstrate that Intel is capable of producing at least something on 10nm.

Unfortunately Intel struggles once again with a new node, 7nm CPU products have been been delayed by 6-12 months. I wonder if they have to delay GPU products as well, I mean Ponte Vecchio was their 7nm lead product.

Words can't even begin to describe how utterly awful is this news. Today may mark the day that Intel's days as an IDM sputter slowly to a halt. At least they still have a lot of cash and revenue (for now).

It must take a lot of patience for a chip architect to work for Intel these days. Clearly, Jim Keller had none.

While this news would appear to lend credence to the idea that Keller was pushing for Intel to move their CPU designs to TSMC nodes, I'm still going to defer to Keller's own announcement out of respect. If he really wanted us to think he was going to bail on Intel like that, he would have just come out and said something about "professional differences" and left it at that. Or Intel would have pulled a BK on him and accused him of sexual harassment/an affair.

Hmm, could be Samsung too.

Raja sure looks happy about something.
 
Last edited: