Discussion Intel current and future Lakes & Rapids thread

Page 659 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
~50 10mm2 cores can be put on a 650 mm2 tile that uses EMIB to connect to another such tile and an ”IO tile”. The extra 150mm2 is for EMIB support and accelerators. And let’s assume 10-15% cores have to be disabled due to defects. That‘s 85-90 functioning cores. Is that enough? In 2024??
Why assume they're limited to two such tiles? One of the graphics shown above includes three. Purely in terms of core count, that should make Intel reasonably competitive.

People here don’t even believe Intel can yield 400 mm2 SPR die on now fairly mature Intel 7. Is full year of yield learning good enough to have such high confidence in yielding ginormous die? There is also plenty of learning Intel 3 has to go through for new features. And not everything is going to be orthogonal there.
I've said it many times now, but those blaming yields for SPR is simply incorrect. No sense in humoring that train of thought further.

As for Intel 3 vs 4, for a compute tile, the new libraries wouldn't make much of a difference, and they clearly thought they could do it on Intel 4 originally. Might even be design compatible.

How does one have a passive interposer/base die with cache on it? Don’t the cache transistors need switching??
If you absolutely require bigger caches to support the large number of cores, what other options does one have, other than stacking? And, what about memory accesses? Doesn’t that need to be low latency?
I should clarify. When I was talking about a passive interposer, I meant just to consider how many raw wafers extra that would require. Intel ships a lot of server chips. Is that much spare wafer capacity sitting around?

And then if you want an active interposer, well now you have to find that same capacity on a modern-ish node (depending what you want on there). With the chip shortage in mind, is that possible? And keep in mind that there's a balance here. You could theoretically move the L3 and IO to the base die, but then you would want it to be a modern process to benefit from high SRAM density, high performance for fast IO, etc. A pure SLC/memory-side cache die could be much cheaper, but then you probably need at least the memory controller and L3 on top for performance.

Also, Foveros wouldn't really save memory latency. You'd still want the controller and PHY at the edge, as you would on a monolithic die, but then you also have the overhead of die-die hop.

Falcon Shores etc, being intermediate steps to Zetta scale, are really large designs. Making them 2.5D would make the socket stupendously huge
Yeah, these would be huge designs, but is that a problem? Birch Stream is rumored to be what? 7529 pins? Seems to fit!
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
Their HEDT segment is dead. It's called Xeon Workstation now.

This new CPU Xeon W-3433 is the natural successor to the Ice lake Xeon W-3335(16 core also) which lags behind the Zen2 3955WX. This new W-3433 might be a good match for the 5955WX but will be No match for the 7950WX

View attachment 63337


View attachment 63338



at only 16 cores it could probably clock high enough to come close to 7950x in performance.

edit: misread 7950wx as 7950x, my bad. Still, 16 core Zen4 product, only TR one, right?
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Isn't Intel still claiming to have 20a online for production in 2024? Not that, you know, that's believable.
I actually believe 2024 for 20A is reasonably achievable. But even if it is, larger server products tend to require at least another few months. Or to approach it from another perspective, we know Intel's [supposedly] launching a server product on Intel 3 in 2024. You don't honestly think they could get out another generation (on a new process) within the same year, right?

@Exist50 Don't know why you think it needs Intel 3 to be competitive with TSMC N4. Already SemiWiki and Wikichip says Intel 4 is between N5 and N3 and is closer to N3. Perhaps in density you are right but Intel 3 is another 18% gain in performance, that's a gain equal to N7 to N5 and N5 to N3.
Server is about low to mid voltage, and contains a lot of SRAM, two of Intel's greatest weaknesses vs TSMC. Maybe Intel could compete via Intel 4, but I think their chances are far better with Lion Cove and Intel 3.

Remember I said Sapphire Rapids HBM is BGA.
Has that been confirmed anywhere? I thought it uses the same socket as normal SPR.

Remember how we talk about how Golden Cove is inefficient in die area? I think they'll make this better generation after generation.
I think Lion Cove would be the gen to fix this.

Ian says Granite Rapids is using HD libraries which doesn't exist for Intel 4.
Has that been confirmed by Intel? I have at least some suspicions...

Good density gains will also come with backside power delivery on Intel 20A using Power Via, better than N2’s Buried Power Rails.
Power Via and Buried Power Rails would probably be minor improvements for density for this generation. But the CPU folk in particular will probably be ecstatic, because they'll help immensely with routing the PDN.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
Ok, figure I might as well give my take on what GNR is. I see three possibilities:

The first two would look something like this:

1655664609532-png.63310


Option A) Compute tile on Intel 3, Combined IO+Mem tile on N-1 node.

Option B) Combined Compute + Mem tile on Intel 3, IO tile on N-1 node.

The last one would look more like this:

1655664643911-png.63311


Option C) Compute tile on Intel 3, Mem tile on N or N-1 node, and IO tile on N-1.
And here's how I think they'd stack up in relative terms.
Option​
Complexity​
Flexibility​
Performance​
A​
low​
mid​
low​
B​
mid​
low​
high​
C​
high​
high​

mid​

I'm guessing either B or C, with a slight preference towards B.
 

ashFTW

Senior member
Sep 21, 2020
307
231
96
I think Intel 7 is more likely. Both for reuse from SPR and keeping the substantial wafer volume internal to Intel. If there were a separate Mem tile, I could see it going either way.
Ok, figure I might as well give my take on what GNR is. I see three possibilities:

The first two would look something like this:

1655664609532-png.63310


Option A) Compute tile on Intel 3, Combined IO+Mem tile on N-1 node.

Option B) Combined Compute + Mem tile on Intel 3, IO tile on N-1 node.

The last one would look more like this:

1655664643911-png.63311


Option C) Compute tile on Intel 3, Mem tile on N or N-1 node, and IO tile on N-1.
And here's how I think they'd stack up in relative terms.

Option​
Complexity​
Flexibility​
Performance​
A​
low​
mid​
low​
B​
mid​
low​
high​
C​
high​
high​

mid​

I'm guessing either B or C, with a slight preference towards B.
Look at the bottom chip below. That’s also a future Xeon; FS will be using Xeon socket. So Intel has communicated 3 future Xeon designs. Maybe this is after Granite and Diamond, but I would really like them to get there sooner.

1655743982528.jpeg

Ok guys, I'm going to detach from here and go do some real work. It was nice discussing with everyone here. Take care!
 
Last edited:

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
at only 16 cores it could probably clock high enough to come close to 7950x in performance.

edit: misread 7950wx as 7950x, my bad. Still, 16 core Zen4 product, only TR one, right?

Perhaps, but we have seen how Xeons are affected by the Mesh Ring latency.(Ring vs Mesh gaming)


Is that Sisoft Sandra Benchmark using AVX-512? There is No way that a low clock(2.0 Ghz) Xeon comes close to the 12900K having the same uArch in Single Thread

1655744853132.png
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
Perhaps, but we have seen how Xeons are affected by the Mesh Ring latency.(Ring vs Mesh gaming)


Is that Sisoft Sandra Benchmark using AVX-512? There is No way that a low clock(2.0 Ghz) Xeon comes close to the 12900K having the same uArch in Single Thread

View attachment 63341

its low clock cause its ES, right? i dont see why lower count xeon, workstation one, pretty much supposed replacement for core-X, would not boost past 5GHz on single core.

edit: misunderstood you a bit, yes, i dont think the score can be the same as 12900k, at 2ghz. its clearly clocked way higher.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
its low clock cause its ES, right?

misunderstood you a bit, yes, i dont think the score can be the same as 12900k, at 2ghz. its clearly clocked way higher.
That is obviously not an ES as it has been assigned a Release model. It's the Xeon W-3433

12900k is getting about 500 points at 5.2 Ghz so that 16C/32T Sapphire Rapids Xeon must be clocking at 4.7 Ghz to get that number
 
Last edited:

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
That is obviously not an ES as it has been assigned a Release model. It's the Xeon W-3433

12900k is getting about 500 points at 5.2 Ghz so that 16C/32T Sapphire Rapids Xeon must be clocking at 4.7 Ghz to get that number

well the graph you posted says ES-CPU :)
so if it has assigned release number, does it mean the release might be close?
well, i dont see why that xeon would not clock that high.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
well the graph you posted says ES-CPU :)
so if it has assigned release number, does it mean the release might be close?
well, i dont see why that xeon would not clock that high.
One should hope for a near future release and the HEDT/Workstation segment have been stagnating due to lack of competition

And the "ES" was assigned by WCCFTECH. The Sisoftware entry appears to be a Release model.
 

Timmah!

Golden Member
Jul 24, 2010
1,418
630
136
One should hope for a near future release and the HEDT/Workstation segment have been stagnating due to lack of competition

And the "ES" was assigned by WCCFTECH. The Sisoftware entry appears to be a Release model.

yeah, i see, mistake on wccf part that confused me.
i certainly do hope so. Its interesting though there are release SKUs laid out, but literally zero info about them except MLID rumors.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,007
136
When AMD and Intel make power and performance claims regarding future nodes does this mean that they have produced prototype parts on these nodes and tested them or are these numbers purely theoretical?
 

DrMrLordX

Lifer
Apr 27, 2000
21,620
10,830
136
This new W-3433 might be a good match for the 5955WX but will be No match for the 7950WX

7950WX when, though?

I actually believe 2024 for 20A is reasonably achievable. But even if it is, larger server products tend to require at least another few months. Or to approach it from another perspective, we know Intel's [supposedly] launching a server product on Intel 3 in 2024. You don't honestly think they could get out another generation (on a new process) within the same year, right?

I'm skeptical. We still haven't seen Intel 4 in a commercial product, and we may not until next year. That gives Intel less than two years to actually launch commercial products on Intel 4, Intel 3, and Intel 20a. And as far as enterprise goes, they're still struggling to get volume on 10ESF. Yes I know "it's a design problem not a node problem", or so it has been repeated here quite often. Sapphire Rapids doesn't instill much confidence that their future designs will come to market with any greater fluidity.
 
  • Like
Reactions: Tlh97 and ftt

Henry swagger

Senior member
Feb 9, 2022
364
237
86
When AMD and Intel make power and performance claims regarding future nodes does this mean that they have produced prototype parts on these nodes and tested them or are these numbers purely theoretical?
They have test chips in the lab.. for example intel used a arm core to test intel vs intel 7
 

dullard

Elite Member
May 21, 2001
25,055
3,408
126
When AMD and Intel make power and performance claims regarding future nodes does this mean that they have produced prototype parts on these nodes and tested them or are these numbers purely theoretical?
One thing to keep in mind is that node performance claims are not the same thing as CPU performance claims. The performance / power curve for a node is relatively straightforward to predict and measure. The performance / power curve for a CPU is complex (depends on size of cache, number of cores, branch predictions, necessary fixes for errata or vulnerabilities, type of software used, memory used, etc).

The design rules of the node are set long before the CPU is designed. Thus, they will know pretty well how the node will perform long before any CPU is even available to test. The power used by a node has three main categories: leakage power, short circuit power, and dynamic power. Each of these parameters is set early on in the process. A very simple test chip can be made and measured to confirm.
  • Leakage power is dependent on the type of transistors: mostly the materials used, the transistor dimensions, and transistor spacing. The closer the transistors and wires are to each other, the less insulation and the more leakage. This is also dependent on voltage used and the transistor temperature. But, once you decide on the transistor design for a node you know the leakage power you will have at each voltage.

  • Short circuit power is dependent on the transistor design. Transistors are not infinitely fast. There is a transition period when they switch states. During this period the input voltage is temporarily connected to ground and the transistor leaks power during this temporary short circuit state. But, once you decide on the transistor design for a node you know the short circuit power you will have at each voltage.

  • Dynamic power depends on the amount of material (mostly metal) and how often it is charged up/drained. Switching a transistor from 0 to 1 requires energizing the entire metal portion of the transistor and wires connecting it. Switching back to 0 drains all that power away. So, the lower the capacitance of that transistor and connecting wires, the lower the energy used for those transition. This dynamic power is related to the voltage (charging a higher voltage requires more energy--this is nonlinear) and number of transitions (proportional to processor frequency). But again, this is all fixed once the node rules for transistor design, metal sizes, and transistor spacing are made.
The dynamic power is the hardest to theoretically predict. But it can be modelled fairly well. The designers will have a pretty good idea what will happen. But, ultimately they will have a test chip to measure the dynamic power long before final CPUs are available.
 

lobz

Platinum Member
Feb 10, 2017
2,057
2,856
136
One thing to keep in mind is that node performance claims are not the same thing as CPU performance claims. The performance / power curve for a node is relatively straightforward to predict and measure. The performance / power curve for a CPU is complex (depends on size of cache, number of cores, branch predictions, necessary fixes for errata or vulnerabilities, type of software used, memory used, etc).

The design rules of the node are set long before the CPU is designed. Thus, they will know pretty well how the node will perform long before any CPU is even available to test. The power used by a node has three main categories: leakage power, short circuit power, and dynamic power. Each of these parameters is set early on in the process. A very simple test chip can be made and measured to confirm.
  • Leakage power is dependent on the type of transistors: mostly the materials used, the transistor dimensions, and transistor spacing. The closer the transistors and wires are to each other, the less insulation and the more leakage. This is also dependent on voltage used and the transistor temperature. But, once you decide on the transistor design for a node you know the leakage power you will have at each voltage.

  • Short circuit power is dependent on the transistor design. Transistors are not infinitely fast. There is a transition period when they switch states. During this period the input voltage is temporarily connected to ground and the transistor leaks power during this temporary short circuit state. But, once you decide on the transistor design for a node you know the short circuit power you will have at each voltage.

  • Dynamic power depends on the amount of material (mostly metal) and how often it is charged up/drained. Switching a transistor from 0 to 1 requires energizing the entire metal portion of the transistor and wires connecting it. Switching back to 0 drains all that power away. So, the lower the capacitance of that transistor and connecting wires, the lower the energy used for those transition. This dynamic power is related to the voltage (charging a higher voltage requires more energy--this is nonlinear) and number of transitions (proportional to processor frequency). But again, this is all fixed once the node rules for transistor design, metal sizes, and transistor spacing are made.
The dynamic power is the hardest to theoretically predict. But it can be modelled fairly well. The designers will have a pretty good idea what will happen. But, ultimately they will have a test chip to measure the dynamic power long before final CPUs are available.
Even though I rarely agree with your biased opinions (most probably because of my own bias? 🙄), this post was nothing short of great!
 

dullard

Elite Member
May 21, 2001
25,055
3,408
126
Even though I rarely agree with your biased opinions (most probably because of my own bias? 🙄), this post was nothing short of great!
Thanks. It may be a bit overly simplistic. Anyone can feel free to correct me and I'll edit the post.

I try to stay neutral in the AMD/Intel aspect on my posts. This is because I think they are both on the same x86 team against ARM. But, I do have my own biases and probably am not always as neutral as I want to be. I do try to complement both AMD and Intel chips and criticize both companies when necessary.
  • AMD right now has the best HEDT chip (the 5950X is really, really great if you need high performance and high efficiency) and the better server chips--they are great and dominating for a reason. But I also think AMD chips come with a significant drawback that for the last 2 years no iGPU meant doubling the price of a computer for the average consumer and putting in a high power GPU reduces a lot of the efficiency gains that you thought you got.
  • Intel has some great technology--I do think efficient cores are a must when we get to 32+ core chips because mathematically there just isn't enough power to give to P cores. Their mix of P and E cores WILL be great in the future, but right now it is off the mark especially with the Alder Lake i7 chips. The Alder Lake desktop i7 chips should be avoided like the plague. So should any Celeron chip be avoided as the performance just isn't there.
  • But, ultimately, if you look closely at benchmarks, the companies trade off wins here and there. They both are doing good things and making great chips. I will use either company as the situation calls for it. I'm typing this on a Ryzen and I have an Alder Lake in the mail as we speak (12600T).

What I will do is look right through any BS that people love to fight over. The more BS that Intel fans spread, the more I'll defend an AMD chip. The more BS that AMD fans spread, the more I'll defend an Intel chip. Thus, my posts will swing from one side to the other over the years depending on which chip needs to be defended.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,785
136
When AMD and Intel make power and performance claims regarding future nodes does this mean that they have produced prototype parts on these nodes and tested them or are these numbers purely theoretical?

Node claims are basically transistor level. So when they say performance it means it can drive that much more. When they say power it means it can use that much less at given current as the predecessor.

Also you need to normalize between transistor types. There are high threshold voltage transistors that are higher voltage but fraction of the leakage current. The lower threshold voltage ones are opposite. There are also high voltage transistors for I/O.

The drive current and power numbers are with voltage and transistor types normalized.

@dullard went into greater detail but in an actual device you may have a combination of all the above. High voltage increases dynamic power but lower leakage will result in lower idle and sleep power. If you look at chips for example, the U chips use more power per clock than the H chips do but the U will have much lower idle power. The S chips take that even further because of the need to reach extremely high frequencies.
 

dullard

Elite Member
May 21, 2001
25,055
3,408
126
Congrats! So you finally got your dream chip. Looking forward to review/benchmarks soon! How much did it set you back?
I'll be glad to do a couple benchmarks this weekend. Any free to download ones you want me to do? This computer isn't anything special so it would not break any speed records. There actually aren't very many 12600T benchmarks at all.

It set me back $1106. I'm kicking myself since if I just waited 3 weeks it would be almost $50 cheaper (HP doesn't price match the Elite line). I could have also gotten the same specs for about $1000 but I didn't want to take the time for this computer to open it up and swap memory/hard drive/reinstall Windows (tiny computers are not pleasant to work in).
  • HP Elite Mini 600 G9 - 35W
  • Intel® Core™ i5-12600T Processor (2.1 GHz, up to 4.6 GHz w/Boost, 18 MB cache, 6 core, 35W) + Intel® UHD Graphics 770
  • Windows 11 Home
  • 16 GB (2 x 8 GB) DDR5-4800 SODIMM Memory
  • HP 3 year Next Business Day Onsite Desktop Only Hardware Support Warranty Extension
  • Add on card: Type-C USB 3.1 Gen 2 Port with 100W Power Delivery from Display (v2)
  • Add on card: 2 x Type-A USB 2.0 I/O
  • Add on: 35 W SATA Cage (I'll eventually put in a large drive from the computer that this computer is replacing)
  • 512 GB PCIe 4x4 2280 NVMe TLC SSD
  • Intel® Wi-Fi 6E AX211 + Bluetooth® 5.2 - External Antenna
I nearly got a ASRock NUC BOX-1260P for about the same price but a much better hard drive but I didn't want to figure out and pay customs fees from AVADirect. Very few other tiny computers are available given the fact that they were announced 3 to 5 months ago.