Discussion Intel current and future Lakes & Rapids thread

Page 775 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

eek2121

Diamond Member
Aug 2, 2005
3,384
5,011
136
Let me rephrase - it's literally marketing. Intel could call it's nodes whatever it likes. I don't think Intel's made any actual claims yet as to density or quality.
Yes, but one thing I'll bet money on: Intel 7 is at least as good as TSMC N7, and Intel 4 will be at least as good as TSMC N4, and so on...

Quotes are confidential. You would know that if you ever dealt with such things. i am done talking with you. Waste of energy
No, they aren't unless you are big enough to be able to order 10s of thousands of units from Intel direct. Once you involve a 3rd party, this is no longer the case. I can absolutely tell you that the price on Intel Ark is on the high side compared to the server/hardware I can get from Dell. Often I can get a full system for less than that price.
Amd is not paying you so stop with the cheerleading lol
AMD has a better core with higher perf/watt. There may be some pro-AMD folks here, but the reality is Intel has fallen behind both in terms of absolute performance, and in terms of perf/watt and compute density/compute density/watt.

People are right to give Intel a hard time.

Intel is likely to eventually correct the ship. They always do. However, until then, bloody noses for everyone!
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
Integer performance: Integer performance is very difficult to increase and improvement in Integer performance indicates "quality" of the core. It is a complex combination of latency, frequency, balance of units, branch prediction(and pipeline length), across all areas. You cannot have too big of an L1 cache as it increases latency, but can't be too small. You cannot just increase branch targets either, latency will come into play and so does performance of the algorithm. Decoders won't scale without beefing up the rest.

How do you increase top speed of a car from 200km/h to 300km/h? Just double engine displacement and horsepower? No. Aerodynamics needs to be greatly improved, since wind resistance is the limit at high speeds. You need the transmission to keep up so it does not fail, and also switch fast and seamlessly. And you need to do all that while making it lighter. You need a capable driver, because otherwise an accident might happen. Not a simple solution at all.

The 787 achieved 20% fuel consumption reduction by moving to composite materials rather than just aircraft aluminum for the chassis. Then they had to move the battery to lighter lithium technology. The engine has been replaced for a bigger, and slower running turbojet. And they had to do aerodynamic work using CFDs as well.

That's why Integer performance is the most important aspect of a general purpose CPU. It is said that in order to chase 1% improvement in performance, CPU architects did what can be called "heroic" work. It's amazing what they are doing now. The work pays off though. Improving Integer performance, or uarch benefits everything. Deep learning, floating point, word processing, gaming, emulation, snappiness.

That's why it's absolutely retarded when mega corporations mistreat employees, especially veteran ones. These kind of decisions require very seasoned, and experienced architects with 30+ years of experience. That's an entire lifetime doing nothing but being an CPU architect and being top of the field at that!

Density: The L2 array is not LLC(Last Level Cache) anymore and is partway a core cache. It has more stringent voltage and power requirements, thus the cells used aren't exactly bog standard ones for L3.

They could have improved the density aspect compared to the competition in Redwood Cove, thus making it more favorable against chips such as Zen 4. This is me speculating based on what you said though.

Also, there is a fundamental limit. A company that does a better job scaling down will reach limits faster than those that won't. You could argue the limits "democratize" compute and make latest technologies available to the small companies and those with less resources.

That's why DRAM has been on the 10nm class node(10-19nm) for a 5 years now. 10x, 10y, 10z, 10a, 10b, and they even talk about 10y! They will reach a decade, since Micron just announced 10b(10 Beta) availability as being the most advanced node. 10x = 17-18nm depending on the manufacturer.

Very much a possibility the designations are as follows:
10x = 17-18nm
10y = 16-17nm
10z = 15-16nm
10a = 14-15nm
10b = 13-14nm
10y = 12-13nm

1nm improvement per "generation".

Why? Because DRAM at 10nm is far, far denser than logic DRAM(3x the density of eDRAM) and that is denser than logic cells. TSMC is showing almost no gains for SRAM on the N3 node, meaning even SRAM is hitting hard limits. Logic isn't hitting yet because it's less dense.
I think you're being a touch dramatic here. On the topic of CPU architecture and IPC, 1% is easy. You can get that by slightly increasing the sizes of existing structures or adding a few smaller features throughput the pipeline, all without materially impacting your timing. 10-20% is harder, but typical generational changes. Beyond that is where things start to get truly interesting.

As for SRAM and process technology in general, we're far from any fundamental limits. Just because the gains are somewhat lackluster for this one gen doesn't mean we won't see significant improvement in the future. Plenty of room left at the bottom.
 
  • Like
Reactions: Tlh97

Geddagod

Golden Member
Dec 28, 2021
1,406
1,527
106
In hindsight, that leak about Granite Rapids introducing higher bandwidth per core suddenly becomes way more important, if they were referring to L3 bandwidth.
Because they way I see it, this was the conclusion of the chips and cheese testing of SPR:
L3 latency is high and bandwidth is mediocre. If SPR faces an all-core load with a poor L2 hit rate, multi-threaded scaling could be impacted by the limited L3 bandwidth.
SPR has slightly higher caching capacity but far worse performance characteristics, which will make its advantages even less widely applicable. The L3 capacity flexibility is overshadowed by AMD’s L3 performance advantages, and EPYC has so much L3 that it can afford to duplicate some shared data around and still do well.
I think the main issue however is the lack of L3 bandwidth, not necessarily the high L3 latency.
 
  • Like
Reactions: wilds

Doug S

Diamond Member
Feb 8, 2020
3,323
5,800
136
There are actual numbers of the node itself, which are measured the same way across foundries, that you could use to compare Intel 4 and TSMC nodes.


There are no "actual numbers for the node itself" for Intel 4 or anything newer because Intel isn't shipping those processes in quantity yet. The same is true for TSMC's N3/N3E until that ships in volume. There were supposed numbers for Intel's 10nm years ago, which proved to be fables by the time the actual 10nm process shipped in volume. What matters is what actually ships, not what numbers they come up with during development and crow about in ISSCC.

They are also in no way whatsoever "measured in the same way across foundries". If you think that, I have a bridge I'd like to sell you.
 

Geddagod

Golden Member
Dec 28, 2021
1,406
1,527
106
There are no "actual numbers for the node itself" for Intel 4 or anything newer because Intel isn't shipping those processes in quantity yet. The same is true for TSMC's N3/N3E until that ships in volume. There were supposed numbers for Intel's 10nm years ago, which proved to be fables by the time the actual 10nm process shipped in volume. What matters is what actually ships, not what numbers they come up with during development and crow about in ISSCC.

They are also in no way whatsoever "measured in the same way across foundries". If you think that, I have a bridge I'd like to sell you.
There are, theoretical max transistors per mm^2, which is comparable across many different foundries.
1678686744999.png
And here's the same for SRAM
1678686774446.png

Yes, Intel 4 and TSMC 3nm aren't being shipped in volume, but we know the size of each the cells because both companies disclosed the numbers for such.
"Oh but no company ever reaches the theoretical max density", well ye, because they use a different combination of cells and SRAM cells. Doesn't make the metric any less useful in comparing node on node.
The thing about your 10nm example was that TechInsights? actually did find cannon lake hitting 100mm^2 in some parts of the chips.
As for higher volume 10nm ESF parts, despite the fact that ESF increased gate pitch, over all density of the HP cells actually increased because less buffers were needed between the HP cell libs, according to Intel.
The numbers Intel quotes in power points, are quite literarily the same ones essentially that ship. They just seem to use HP cells as their standard lib, while AMD uses HD cells (even surprisingly for 5nm Zen 4).
Probably the only reasonable caveat to bring up would be Muh Design Rules, but that's about it.
 

BorisTheBlade82

Senior member
May 1, 2020
700
1,112
136
As for SRAM and process technology in general, we're far from any fundamental limits. Just because the gains are somewhat lackluster for this one gen doesn't mean we won't see significant improvement in the future. Plenty of room left at the bottom.

So, to you the trend that we got almost no optical shrinks on the plane and instead have been moving to 3D structures (Fin FET, GAA, etc.) for generations already, is no clear indication of real technical difficulties?
 
Last edited:

BorisTheBlade82

Senior member
May 1, 2020
700
1,112
136
There are no "actual numbers for the node itself" for Intel 4 or anything newer because Intel isn't shipping those processes in quantity yet. The same is true for TSMC's N3/N3E until that ships in volume. There were supposed numbers for Intel's 10nm years ago, which proved to be fables by the time the actual 10nm process shipped in volume. What matters is what actually ships, not what numbers they come up with during development and crow about in ISSCC.

They are also in no way whatsoever "measured in the same way across foundries". If you think that, I have a bridge I'd like to sell you.
There are indeed, for example from our highly regarded David Schor: https://fuse.wikichip.org/news/6720/a-look-at-intel-4-process-technology/
 
  • Like
Reactions: igor_kavinski

Exist50

Platinum Member
Aug 18, 2016
2,452
3,106
136
So, to you the trend that we got almost no optical shrinks on the plane and instead have been moving to 3D structures (Fin FET, GAA, etc.) for generations already, is no clear indication of real technical difficulties?
There are challenges, sure, but that doesn't mean we've hit, nor are anywhere close to hitting, density scaling limits. If anything, the bigger problem is that density scaling is well eclipsing power scaling.
 
  • Like
Reactions: Tlh97 and Kepler_L2

Geddagod

Golden Member
Dec 28, 2021
1,406
1,527
106
Those who think that the transistor count metrics are comparing apples to apples between foundries (or even between two chips made in the same foundry) might want to read what David Kanter has to say on the subject:

https://www.realworldtech.com/transistor-count-flawed-metric/
Well I'm glad no one here thinks transistor count metrics are apples to apples between foundries.
Theoretical Max Transistor density does not equal transistor count metrics.
The chart, that I uploaded, is the max density if you just pack as many transistors of that type into a chip, design be screwed.
The stuff that David Kanter is talking about, is comparing transistor density on one chip vs another chip, which can't be compared, since designs are different.
For maximum theoretical density, there is no design, because they are literarily just adding the exact same transistor over and over- in other words, you are comparing essentially the size of each transistor of each cell type.
And for SRAM size, we are literarily comparing the size of each cell. That's not even transistor count or density or anything, it's quite literarily the physical size of the cell.
Again, probably the only caveat to looking at just maximum theoretical density for node vs node are design rules that may prevent designers from utilizing the increased density. I'm not talking about design rules as in "oh it's thermal density is too high" or "oh we need to increase frequency, add in some HP cells here", I'm talking about design rules as in "due to routing we physically can not place these two transistors next to each other".
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,100
16,015
136

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
27,100
16,015
136
  • Like
Reactions: igor_kavinski

Edrick

Golden Member
Feb 18, 2010
1,939
230
106

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
So Intel HEDT is finally back. But only for those that have super cheap electricity or steal it from their rich neighbors :D

Honestly, I would not call these HEDT. They are clearly in the Workstation realm (Xeon and TR Pro). In my opinion, HEDT was sort of a middle ground between the consumer lines and workstation lines. Give us an i9 (all big cores), without ECC RAM and able to work on a motherboard that does not cost $900+, then I would consider that HEDT.
 
Jul 27, 2020
26,183
18,038
146
Honestly, I would not call these HEDT. They are clearly in the Workstation realm (Xeon and TR Pro).
Did Intel tout previous workstations with OC/tuning abilities? Fish Hawk seems to be targeting people with too much money who want to play with stuff few others can.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Did Intel tout previous workstations with OC/tuning abilities? Fish Hawk seems to be targeting people with too much money who want to play with stuff few others can.

No, they did not. We had X299, X99, etc. for all that. But you are right, Intel is targeting people with too much money. I was waiting for Fish Hawk myself (was on X299), but when they announced the pricing, I went with a 13700k instead. I can't justify ~$2500 for just a motherboard and 16 core CPU.
 
  • Like
Reactions: igor_kavinski

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
That's 40 W a core. That sounds about right actually.

Based on what calculation? A 13700k pulls in ~200 watts at 5.0Ghz on Prime95 small (with E-cores active @ 4.0Ghz). For argument sake, lets say it pulls 200W with the E-cores off. They are on the same process node as SPR and the P-cores are essentially the same cores (minus AVX512). To me, 600W seems more likely than 1000W. Now, if the tests were being run using AVX512, and there was no offset in frequency for AVX512, I could see it using more than 600W, but 1000 still seems very unlikely.