Discussion Intel current and future Lakes & Rapids thread

Page 788 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
No gigantic bandwidth needed between CCDs because they all share the same universal MC. And the diminishing returns of using remote Caches simply is not worth it.

Don't you realize that Intel pays for their EMIB a similar price in consumption, just because they need truckloads of bandwidth in order to make their topology work at all?

We had this type of discussion already with Alder Lake, Intel seemed to have very stupid L3 cache setup. Very high latency for size compared to AMD. Fast forward to RPL, and suddenly their L3 is performing better due to larger L2 cache shielding it from traffic and much faster uncore speeds.
What we have with SPR is compromise of anemic L3 cache that does not show up in action. If they are able to increase the size substantially in the future they will reap the benefits, cause the latency price is already paid.

So Intel's chip level architecture is more advanced now and if they can throw way more transistors in the form of cores and caches, they will have performant chip. That rests on their process execution ability.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
We had this type of discussion already with Alder Lake, Intel seemed to have very stupid L3 cache setup. Very high latency for size compared to AMD. Fast forward to RPL, and suddenly their L3 is performing better due to larger L2 cache shielding it from traffic and much faster uncore speeds.

They are all experimenting and learning as they go. When you are on the cutting edge, you will make mistakes, but it's inevitable.

And what works for one company won't work for another. Gotta think dynamically and adjust based on your situation.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
Skymont was always an Atom core name. Though that should be a particularly fun one.

Way, way back Skymont was mentioned for Core lines. Think Nehalem era, maybe Sandy Bridge.

Yeah, it seems reasonable enough. Even from just a die size perspective, it's around what you'd expect from 2-3 large compute dies.

I am asking because 80 cores are not going to be competitive. They need the 120 core setup. A mere 40% gain over the shoddy EMR chip, really? Sure it'll be for -AP, but that's a different class altogether. Cascade Lake-AP who?
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,829
136
TBH, I expect one of the next EPYC to look a bit like this.

It may not be entirely relevant to the discussion, but that mockup of hypothetical Zen5 on AM5 raises some questions. To date, AMD has always oriented their CCDs horizontally with respect to the I/O die on consumer chips. Kinda makes me wonder if there is a technical reason for that. Also placing 3 CCDs that close together on a high-frequency consumer chip may spell bad news.
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
It may not be entirely relevant to the discussion, but that mockup of hypothetical Zen5 on AM5 raises some questions. To date, AMD has always oriented their CCDs horizontally with respect to the I/O die on consumer chips. Kinda makes me wonder if there is a technical reason for that. Also placing 3 CCDs that close together on a high-frequency consumer chip may spell bad news.
It surely had to do with the routing on the package.
There is precedent regarding the tight die-placement: Just look at N31, MI300, PVC and MTL. With advanced packaging, you want them as close together as possible.
 

A///

Diamond Member
Feb 24, 2017
4,352
3,154
136
Short of a ccd redesign incorporating 10 or 12 cores per ccd then shoving them up one against the other would be another way. You're facing heat density whichever path you go to. A larger substrate may make sense in the future but as it stands there's too many components on typical motherboards to allow for a larger processor. When designing for the typical consumer they also have to take into account that third parties may release an itx board. Unless tsmc has some breakthrough packaging design we don't know about idk how AMD will increase core counts outside of those two unless they could further shrink their cores from what they are now and make them even more powerful and energy efficient and thus be able to fit more cores in the current ccd design.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
The Granite Rapids SP does not look that impressive, 80 cores, 8 Channel. Granite Rapids should have been a 2022-2023 processor
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
I am asking because 80 cores are not going to be competitive. They need the 120 core setup. A mere 40% gain over the shoddy EMR chip, really? Sure it'll be for -AP, but that's a different class altogether. Cascade Lake-AP who?
I don't think GNR-AP and Cascade Lake AP will be remotely comparable in volume. Imo, seems like a pretty straightforward lineup. For the highest density and higher performance use cases, there will be GNR-AP. This is probably what the CSPs will use. For the rest of the market (of which there is a large amount of volume), GNR-SP will offer substantial gains over EMR with similar platform cost, TDP, form factors, etc.

Notice that with AMD's first 12 channel platform, they too are adding a scaled down version. Recently they've had the luxury of being able to target only the highest paying customers, but as the competitive situation gets a little more even + gradual marketshare rise, expect to see them focus more on Sienna's successors.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
@Exist50 What you are saying is Intel is basically going back to the E5/E7 era.

It made sense, and they should have done that a while ago. There was very clear distinction between the two however. The E7 had massive amount of memory support, but traded it for latency. It had more cores and redundancy features but took longer to arrive. And it was more expensive.

The E5 was lower latency, but didn't support as much memory. It came faster and RAS capabilities weren't as extensive.

Don't you realize that Intel pays for their EMIB a similar price in consumption, just because they need truckloads of bandwidth in order to make their topology work at all?

Well, no and @JoeRambo is right. The big tradeoff for Intel is that it's harder to do.

Each tile has a memory controller so it doesn't need to pay the tax. Plus EMIB is lower cost than IFOP. If they decide to use that for higher bandwidth, they get the benefit of better performance. Saying that EMIB's power/bit doesn't matter since it needs more makes no sense, since the more bandwidth will result in better performance.

Also for Intel's case it only applies if they need the data from the adjacent tiles and memory controllers. We know the performance difference from a super high bandwidth interconnect is critical in certain segments like Enterprise. That's why Intel's traditional stronghold is transactional database, and why they outperform compared to their core count.*

That's in theory of course. Lakefield proves that in practice, even Foveros may not be beneficial. Because silicon has high variability, and if the team is screwing up, they can lose all the benefits.

*Read about the differences between Nehalem-EP and -EX. The -EX changes are almost foreign to us. However, it's there for the sole benefit of the Enterprise market. IBM's POWER line is also strong there and their focus is almost to a point of obsession. It's a traditional big-iron thing.
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
@IntelUser2000
Okay, maybe we first should establish some common grounds. Maybe you can tell me to agree/disagree with some of the following aspects:
  • Intel's tiles have a lot of crosstalk because of the distributed MCs and the fact, that you can never know which part of the RAM gets used by which core on which tile.
  • In theory, they have the big advantage, that one single core can use the bandwidth of all MCs combined. After several reviews I fail to see a significant amount of common DC workloads where that makes a difference.
  • Additionally, they have a lot of crosstalk because of the ability to use remote L3s. But because of the rather small bandwidth and the awkward latencies, there again seems to be no significant benefit. On top of that it gets trumped significantly by AMDs V-Cache.
  • The crosstalk adds up, so that despite EMIB being much more efficient per bit transferred than IFoP, the sheer amount of bits transferred significantly weakens that advantage.
  • Although AMD can only use 64/32 GByte/s RAM bandwidth per CCD, this does not seem to be detrimental in DC workloads that are worth talking about.
  • Intel needs to make 2 different types of very big tiles for only a 4 tile SKU. AMD only has 2 kinds of dies of which one is really small. And they use up to 13 of them in one SKU. That is a brutal production cost advantage.
  • Intel is having a hard time scaling its approach because, aside from geometrical challenges, you increase the bandwidth needed for each further tile significantly. AMD OTOH can just add as much computing power as they like without big difficulty.
/edit:
And the really sad thing is, that they do not seem to turn this ship around when looking at Siera Forest.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
  • Intel's tiles have a lot of crosstalk because of the distributed MCs and the fact, that you can never know which part of the RAM gets used by which core on which tile.
  • In theory, they have the big advantage, that one single core can use the bandwidth of all MCs combined. After several reviews I fail to see a significant amount of common DC workloads where that makes a difference.
/edit:
And the really sad thing is, that they do not seem to turn this ship around when looking at Siera Forest.

Sierra Forest is further different, as the cores are all on on-die along with the memory controller, unlike Rapids which have multiple core tiles.

Yes, there will be crosstalk, but the ability to access caches from other cores are still an advantage over one that can't.

Most workloads won't show an advantage. The "dark art" is shown in Enterprise workloads, like transactional database. That's why the Xeon E7s were always substantially better than E5's despite having a higher latency memory controller in that particular workload.

AMD and Intel always diverged in this aspect, from way back in the day. Intel is still competing with RISC big-iron, even though x86 has been dominant since long ago.

I think a lot of what you are saying also depends heavily on implementation and execution. Even with an identical product, one that is well executed not only come earlier, but perform better. There is a fixed time aspect towards the final polish, so rushed projects don't get to have that. This difference will easily make up for technical differences.

It's more complex for sure, but they are choosing to do it. And if you look at IBM with POWER, they dwarf what Intel does to cater to that market. Which is why I think the convergence of E5 and E7 with the Skylake generation was kinda stupid. The E5's could cover most of the server volume market, but E7 was needed for Enterprise customers. E5's are quite a bit simpler and go through lot less validation so are nimbler in terms of timeframe. By converging the two, you are risking the entire generation.

The differences between Enterprise and rest of the market is big enough that it should continue to be separate lines. I hope -SP vs -AP is back to those roots.
 

BorisTheBlade82

Senior member
May 1, 2020
663
1,014
106
Sierra Forest is further different, as the cores are all on on-die along with the memory controller, unlike Rapids which have multiple core tiles.
From what I gathered, SRF will consist of three compute dies each having their own MC - not much dissimilar to what SPR has. This was also shown again in the last two to three pages of this very thread.

About the rest: Yes, of course there are workloads where SPR might shine. But they are so few and far in between, that I stay by my assessment that Intel heavily dropped the ball on focussing on what the market generally demands.

And just to mention this: I am a big fanboy of Pat Gunslinger and believe that he is the last resort of Intel.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
From what I gathered, SRF will consist of three compute dies each having their own MC - not much dissimilar to what SPR has. This was also shown again in the last two to three pages of this very thread.

About the rest: Yes, of course there are workloads where SPR might shine. But they are so few and far in between, that I stay by my assessment that Intel heavily dropped the ball on focussing on what the market generally demands.

And just to mention this: I am a big fanboy of Pat Gunslinger and believe that he is the last resort of Intel.

Boris, I don't know where you got that. Look at it closer. It's ONE compute die. Granite Rapids has 1, 2, and 3. Whoever tells you otherwise is wrong. The rest are IO dies.

Yea the Enterprise workloads are THAT different. What I believe they did wrong started with Skylake - by unifying E5 and E7 into one. Dedicating it like back then was the right way to do things. Why they change it? Probably the desire to extract every last dollar of profits. Ironically it usually has the effect of achieving the opposite.
 
Last edited:
  • Like
Reactions: Saylick

Redfire

Junior Member
May 15, 2021
6
5
61
Skymont was always an Atom core name. Though that should be a particularly fun one.
Skymont was the original codename for what became Cannon Lake and Palm Cove. Given the cadence in naming between Atom and Core, that made sense. Now, Coves and Monts continue all the way.
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
20,841
3,189
126
W790 Boards are seeming to be popping up on sale front pages.
Its a 1000 dollar board. Jebus.
I am still looking for real pricing on the W5-3435X as well as some actual comparisons and benchmarks, yet i can't seem to find any and they are supposedly already launched.

W-2400 series up on newegg, which represents the kickstart of intel's HEDT again.
(although its only 64pci-e lanes.)

Again.. absolutely can not find any performance numbers or reviews on these chips.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
I am still looking for real pricing on the W5-3435X as well as some actual comparisons and benchmarks.
Suggested price is 1,500 for 16 cores, if the stock 2495X is slower than a 13900K, the 3435X is much slower. The 5955WX is a good entry point for HEDT as the 3435X
 

aigomorla

CPU, Cases&Cooling Mod PC Gaming Mod Elite Member
Super Moderator
Sep 28, 2005
20,841
3,189
126
Suggested price is 1,500 for 16 cores, if the stock 2495X is slower than a 13900K, the 3435X is much slower.

As long as it can keep up with single core performance of thread ripper, i'll honestly be happy as it will supply 112 pci-e 5.0 lanes, and not feel like a down payment on a car as ThreadRipper has been lately.

I guess i could be happy with the 2400 series, but i want the octo-channel ram support, and the extra pci-e channels on the 3400 series.
But if its a absolute tank compared to threadripper, i guess i need to play that waiting game for it to come out.
 
  • Like
Reactions: ZGR and Edrick

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
I guess i could be happy with the 2400 series, but i want the octo-channel ram support, and the extra pci-e channels on the 3400 series.
But if its a absolute tank compared to threadripper, i guess i need to play that waiting game for it to come out.
At stock the 3400 series are slower than their 5900 series Threadripper pro counter parts. And they can keep that up to some Over Clocking extent(Water Cooled), at about the same power consumption. Their Golden Cove core design gives them a higher OC Pontential but at that stage you are at 1000 Watts or more.
 

Geddagod

Golden Member
Dec 28, 2021
1,149
1,007
106
What are the chances you guys think Granite Rapids uses Lion Cove on Intel 3? At this point in time at least.
I think it's either that, or slightly tweaked Redwood Cove. If Redwood Cove+ is really as big of an improvement as a new core, then why not just use Lion Cove?
This is probably the most interesting debate for Intel leaks in recent times IMO haha
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
What are the chances you guys think Granite Rapids uses Lion Cove on Intel 3? At this point in time at least.
I think it's either that, or slightly tweaked Redwood Cove. If Redwood Cove+ is really as big of an improvement as a new core, then why not just use Lion Cove?
This is probably the most interesting debate for Intel leaks in recent times IMO haha

Zero. Server chips always take longer to come out so Lion Cove would be too early. With Lion Cove we're talking potential 30% gain over Golden Cove. Redwood Cove itself is maybe 3-5% so vast majority is Lion Cove.

If Granite Rapids was Lion Cove, they could keep core count and frequency the same and we'd see 30-40% gains.

Redwood Cove+ is said to be 10%, which is good but not a huge change, which Lion Cove will be.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
So that 34 Core Raptor Core Die is actually one half of the Emerald Rapids two Die per package CPU with up to 64 core I calculated that the size of each Die to be between 770 to 777 mm^2 and Intel is able to harvest about 68 dies and about 60 or less functional dies..

FdxIqhXXgAAfA2N.jpeg
 
  • Like
Reactions: lightmanek