Discussion Intel current and future Lakes & Rapids thread

eek2121 · Mar 12, 2023

jpiniero said:
Let me rephrase - it's literally marketing. Intel could call it's nodes whatever it likes. I don't think Intel's made any actual claims yet as to density or quality.

Yes, but one thing I'll bet money on: Intel 7 is at least as good as TSMC N7, and Intel 4 will be at least as good as TSMC N4, and so on...

ashFTW said:
Quotes are confidential. You would know that if you ever dealt with such things. i am done talking with you. Waste of energy

No, they aren't unless you are big enough to be able to order 10s of thousands of units from Intel direct. Once you involve a 3rd party, this is no longer the case. I can absolutely tell you that the price on Intel Ark is on the high side compared to the server/hardware I can get from Dell. Often I can get a full system for less than that price.

Henry swagger said:
Amd is not paying you so stop with the cheerleading lol

AMD has a better core with higher perf/watt. There may be some pro-AMD folks here, but the reality is Intel has fallen behind both in terms of absolute performance, and in terms of perf/watt and compute density/compute density/watt.

People are right to give Intel a hard time.

Intel is likely to eventually correct the ship. They always do. However, until then, bloody noses for everyone!

Exist50 · Mar 12, 2023

IntelUser2000 said:
Integer performance: Integer performance is very difficult to increase and improvement in Integer performance indicates "quality" of the core. It is a complex combination of latency, frequency, balance of units, branch prediction(and pipeline length), across all areas. You cannot have too big of an L1 cache as it increases latency, but can't be too small. You cannot just increase branch targets either, latency will come into play and so does performance of the algorithm. Decoders won't scale without beefing up the rest.

How do you increase top speed of a car from 200km/h to 300km/h? Just double engine displacement and horsepower? No. Aerodynamics needs to be greatly improved, since wind resistance is the limit at high speeds. You need the transmission to keep up so it does not fail, and also switch fast and seamlessly. And you need to do all that while making it lighter. You need a capable driver, because otherwise an accident might happen. Not a simple solution at all.

The 787 achieved 20% fuel consumption reduction by moving to composite materials rather than just aircraft aluminum for the chassis. Then they had to move the battery to lighter lithium technology. The engine has been replaced for a bigger, and slower running turbojet. And they had to do aerodynamic work using CFDs as well.

That's why Integer performance is the most important aspect of a general purpose CPU. It is said that in order to chase 1% improvement in performance, CPU architects did what can be called "heroic" work. It's amazing what they are doing now. The work pays off though. Improving Integer performance, or uarch benefits everything. Deep learning, floating point, word processing, gaming, emulation, snappiness.

That's why it's absolutely retarded when mega corporations mistreat employees, especially veteran ones. These kind of decisions require very seasoned, and experienced architects with 30+ years of experience. That's an entire lifetime doing nothing but being an CPU architect and being top of the field at that!

Density: The L2 array is not LLC(Last Level Cache) anymore and is partway a core cache. It has more stringent voltage and power requirements, thus the cells used aren't exactly bog standard ones for L3.

They could have improved the density aspect compared to the competition in Redwood Cove, thus making it more favorable against chips such as Zen 4. This is me speculating based on what you said though.

Also, there is a fundamental limit. A company that does a better job scaling down will reach limits faster than those that won't. You could argue the limits "democratize" compute and make latest technologies available to the small companies and those with less resources.

That's why DRAM has been on the 10nm class node(10-19nm) for a 5 years now. 10x, 10y, 10z, 10a, 10b, and they even talk about 10y! They will reach a decade, since Micron just announced 10b(10 Beta) availability as being the most advanced node. 10x = 17-18nm depending on the manufacturer.

Very much a possibility the designations are as follows:
10x = 17-18nm
10y = 16-17nm
10z = 15-16nm
10a = 14-15nm
10b = 13-14nm
10y = 12-13nm

1nm improvement per "generation".

Why? Because DRAM at 10nm is far, far denser than logic DRAM(3x the density of eDRAM) and that is denser than logic cells. TSMC is showing almost no gains for SRAM on the N3 node, meaning even SRAM is hitting hard limits. Logic isn't hitting yet because it's less dense.

I think you're being a touch dramatic here. On the topic of CPU architecture and IPC, 1% is easy. You can get that by slightly increasing the sizes of existing structures or adding a few smaller features throughput the pipeline, all without materially impacting your timing. 10-20% is harder, but typical generational changes. Beyond that is where things start to get truly interesting.

As for SRAM and process technology in general, we're far from any fundamental limits. Just because the gains are somewhat lackluster for this one gen doesn't mean we won't see significant improvement in the future. Plenty of room left at the bottom.

Geddagod · Mar 12, 2023

In hindsight, that leak about Granite Rapids introducing higher bandwidth per core suddenly becomes way more important, if they were referring to L3 bandwidth.
Because they way I see it, this was the conclusion of the chips and cheese testing of SPR:

L3 latency is high and bandwidth is mediocre. If SPR faces an all-core load with a poor L2 hit rate, multi-threaded scaling could be impacted by the limited L3 bandwidth.

SPR has slightly higher caching capacity but far worse performance characteristics, which will make its advantages even less widely applicable. The L3 capacity flexibility is overshadowed by AMD’s L3 performance advantages, and EPYC has so much L3 that it can afford to duplicate some shared data around and still do well.

I think the main issue however is the lack of L3 bandwidth, not necessarily the high L3 latency.

Doug S · Mar 13, 2023

Geddagod said:
There are actual numbers of the node itself, which are measured the same way across foundries, that you could use to compare Intel 4 and TSMC nodes.

There are no "actual numbers for the node itself" for Intel 4 or anything newer because Intel isn't shipping those processes in quantity yet. The same is true for TSMC's N3/N3E until that ships in volume. There were supposed numbers for Intel's 10nm years ago, which proved to be fables by the time the actual 10nm process shipped in volume. What matters is what actually ships, not what numbers they come up with during development and crow about in ISSCC.

They are also in no way whatsoever "measured in the same way across foundries". If you think that, I have a bridge I'd like to sell you.

Geddagod · Mar 13, 2023

Doug S said:
There are no "actual numbers for the node itself" for Intel 4 or anything newer because Intel isn't shipping those processes in quantity yet. The same is true for TSMC's N3/N3E until that ships in volume. There were supposed numbers for Intel's 10nm years ago, which proved to be fables by the time the actual 10nm process shipped in volume. What matters is what actually ships, not what numbers they come up with during development and crow about in ISSCC.

They are also in no way whatsoever "measured in the same way across foundries". If you think that, I have a bridge I'd like to sell you.

There are, theoretical max transistors per mm^2, which is comparable across many different foundries.

And here's the same for SRAM

Yes, Intel 4 and TSMC 3nm aren't being shipped in volume, but we know the size of each the cells because both companies disclosed the numbers for such.
"Oh but no company ever reaches the theoretical max density", well ye, because they use a different combination of cells and SRAM cells. Doesn't make the metric any less useful in comparing node on node.
The thing about your 10nm example was that TechInsights? actually did find cannon lake hitting 100mm^2 in some parts of the chips.
As for higher volume 10nm ESF parts, despite the fact that ESF increased gate pitch, over all density of the HP cells actually increased because less buffers were needed between the HP cell libs, according to Intel.
The numbers Intel quotes in power points, are quite literarily the same ones essentially that ship. They just seem to use HP cells as their standard lib, while AMD uses HD cells (even surprisingly for 5nm Zen 4).
Probably the only reasonable caveat to bring up would be Muh Design Rules, but that's about it.

BorisTheBlade82 · Mar 13, 2023

Exist50 said:
As for SRAM and process technology in general, we're far from any fundamental limits. Just because the gains are somewhat lackluster for this one gen doesn't mean we won't see significant improvement in the future. Plenty of room left at the bottom.

So, to you the trend that we got almost no optical shrinks on the plane and instead have been moving to 3D structures (Fin FET, GAA, etc.) for generations already, is no clear indication of real technical difficulties?

BorisTheBlade82 · Mar 13, 2023

Doug S said:
There are no "actual numbers for the node itself" for Intel 4 or anything newer because Intel isn't shipping those processes in quantity yet. The same is true for TSMC's N3/N3E until that ships in volume. There were supposed numbers for Intel's 10nm years ago, which proved to be fables by the time the actual 10nm process shipped in volume. What matters is what actually ships, not what numbers they come up with during development and crow about in ISSCC.

They are also in no way whatsoever "measured in the same way across foundries". If you think that, I have a bridge I'd like to sell you.

There are indeed, for example from our highly regarded David Schor: https://fuse.wikichip.org/news/6720/a-look-at-intel-4-process-technology/

Exist50 · Mar 13, 2023

BorisTheBlade82 said:
So, to you the trend that we got almost no optical shrinks on the plane and instead have been moving to 3D structures (Fin FET, GAA, etc.) for generations already, is no clear indication of real technical difficulties?

There are challenges, sure, but that doesn't mean we've hit, nor are anywhere close to hitting, density scaling limits. If anything, the bigger problem is that density scaling is well eclipsing power scaling.

nicalandia · Mar 13, 2023

All core O.C 3.46 Ghz. Likely pulling 900 Watts

Supermicro Super Server - Geekbench

Benchmark results for a Supermicro Super Server with an Intel Xeon w9-3495X processor.

browser.geekbench.com

Doug S · Mar 13, 2023

Those who think that the transistor count metrics are comparing apples to apples between foundries (or even between two chips made in the same foundry) might want to read what David Kanter has to say on the subject:

https://www.realworldtech.com/transistor-count-flawed-metric/

Geddagod · Mar 13, 2023

Doug S said:
Those who think that the transistor count metrics are comparing apples to apples between foundries (or even between two chips made in the same foundry) might want to read what David Kanter has to say on the subject:

https://www.realworldtech.com/transistor-count-flawed-metric/

Well I'm glad no one here thinks transistor count metrics are apples to apples between foundries.
Theoretical Max Transistor density does not equal transistor count metrics.
The chart, that I uploaded, is the max density if you just pack as many transistors of that type into a chip, design be screwed.
The stuff that David Kanter is talking about, is comparing transistor density on one chip vs another chip, which can't be compared, since designs are different.
For maximum theoretical density, there is no design, because they are literarily just adding the exact same transistor over and over- in other words, you are comparing essentially the size of each transistor of each cell type.
And for SRAM size, we are literarily comparing the size of each cell. That's not even transistor count or density or anything, it's quite literarily the physical size of the cell.
Again, probably the only caveat to looking at just maximum theoretical density for node vs node are design rules that may prevent designers from utilizing the increased density. I'm not talking about design rules as in "oh it's thermal density is too high" or "oh we need to increase frequency, add in some HP cells here", I'm talking about design rules as in "due to routing we physically can not place these two transistors next to each other".

nicalandia · Mar 14, 2023

Intel Xeon W7 2495X All Core OC to 5.00 Ghz

W790 WS - Geekbench

Benchmark results for a W790 WS with an Intel Xeon w7-2495X processor.

browser.geekbench.com

a 13900K all core OC to 5.8 does

EVGA Corp. Z790 DARK KINGPIN - Geekbench

Benchmark results for an EVGA Corp. Z790 DARK KINGPIN with an Intel Core i9-13900K processor.

browser.geekbench.com

Markfw · Mar 14, 2023

nicalandia said:
Intel Xeon W7 2495X All Core OC to 5.00 Ghz

View attachment 78127

View attachment 78128

W790 WS - Geekbench

Benchmark results for a W790 WS with an Intel Xeon w7-2495X processor.

browser.geekbench.com

a 13900K all core OC to 5.8 does

View attachment 78129

EVGA Corp. Z790 DARK KINGPIN - Geekbench

Benchmark results for an EVGA Corp. Z790 DARK KINGPIN with an Intel Core i9-13900K processor.

browser.geekbench.com

at what wattage ????

nicalandia · Mar 14, 2023

Markfw said:
at what wattage ????

Close to a 1000 Watts

eek2121 · Mar 14, 2023

He’s jacked directly into a nuclear power plant.

Exist50 · Mar 14, 2023

nicalandia said:
Close to a 1000 Watts

Where are these power numbers coming from?

Markfw · Mar 14, 2023

Exist50 said:
Where are these power numbers coming from?

Not sure, but here is one on an AIO using 1097 watts (W9-3495x) getting 53,817 in geekbench 5

Intel Xeon W9-3495X "Sapphire Rapids" CPU Sets New World Record, Overclocked To 4.2 GHz & Pulls Over 1000W

The first overclocking performance benchmarks of Intel's Xeon W9-3495X 56-Core "Sapphire Rapids" CPU have been published by Der8auer.

wccftech.com

Edrick · Mar 14, 2023

nicalandia said:
Close to a 1000 Watts

Source?

There is no way a 24 core CPU is pulling 1000 watts.

Edrick · Mar 14, 2023

Markfw said:
Not sure, but here is one on an AIO using 1097 watts (W9-3495x) getting 53,817 in geekbench 5

Intel Xeon W9-3495X "Sapphire Rapids" CPU Sets New World Record, Overclocked To 4.2 GHz & Pulls Over 1000W

The first overclocking performance benchmarks of Intel's Xeon W9-3495X 56-Core "Sapphire Rapids" CPU have been published by Der8auer.

wccftech.com

Yea, for a 56 core CPU. A 24 core variant is not pulling 1000w.

igor_kavinski · Mar 14, 2023

So Intel HEDT is finally back. But only for those that have super cheap electricity or steal it from their rich neighbors

Edrick · Mar 14, 2023

igor_kavinski said:
So Intel HEDT is finally back. But only for those that have super cheap electricity or steal it from their rich neighbors

Honestly, I would not call these HEDT. They are clearly in the Workstation realm (Xeon and TR Pro). In my opinion, HEDT was sort of a middle ground between the consumer lines and workstation lines. Give us an i9 (all big cores), without ECC RAM and able to work on a motherboard that does not cost $900+, then I would consider that HEDT.

igor_kavinski · Mar 14, 2023

Edrick said:
Honestly, I would not call these HEDT. They are clearly in the Workstation realm (Xeon and TR Pro).

Did Intel tout previous workstations with OC/tuning abilities? Fish Hawk seems to be targeting people with too much money who want to play with stuff few others can.

Edrick · Mar 14, 2023

igor_kavinski said:
Did Intel tout previous workstations with OC/tuning abilities? Fish Hawk seems to be targeting people with too much money who want to play with stuff few others can.

No, they did not. We had X299, X99, etc. for all that. But you are right, Intel is targeting people with too much money. I was waiting for Fish Hawk myself (was on X299), but when they announced the pricing, I went with a 13700k instead. I can't justify ~$2500 for just a motherboard and 16 core CPU.

jpiniero · Mar 14, 2023

Edrick said:
Source?

There is no way a 24 core CPU is pulling 1000 watts.

That's 40 W a core. That sounds about right actually.

Edrick · Mar 14, 2023

jpiniero said:
That's 40 W a core. That sounds about right actually.

Based on what calculation? A 13700k pulls in ~200 watts at 5.0Ghz on Prime95 small (with E-cores active @ 4.0Ghz). For argument sake, lets say it pulls 200W with the E-cores off. They are on the same process node as SPR and the P-cores are essentially the same cores (minus AVX512). To me, 600W seems more likely than 1000W. Now, if the tests were being run using AVX512, and there was no offset in frequency for AVX512, I could see it using more than 600W, but 1000 still seems very unlikely.

Discussion Intel current and future Lakes & Rapids thread

Diamond Member

Platinum Member

Golden Member

Diamond Member

Golden Member

Senior member

Senior member

Platinum Member

Diamond Member

Diamond Member

Golden Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member

Platinum Member

Moderator Emeritus, Elite Member

Golden Member

Golden Member

Lifer

Golden Member

Lifer

Golden Member

Lifer

Golden Member