Discussion Intel current and future Lakes & Rapids thread

Kocicak · Sep 20, 2022

Perhaps the goal is not a finished processor, but the development process itself. To keep people employed and entertained.

That reminds me Hunger wall of Prague, a project believed to be built just to give work and food to the poor.

moinmoin · Sep 20, 2022

Simulation can't help debug issues on the silicon itself.

I don't know about Intel's current state in that regard, but Keller's most published push for change within AMD was for more real time monitoring of the chip from within, through the addition of hundreds to thousands of sensors as part of the Scalable Control Fabric part of Infinity Fabric. While these are said to be used for the chip to e.g. optimize for power usage and adapt to degradation during use, they obviously are also very helpful during development and binning.

I'd expect if Intel was lacking in that area before (as SPR seems to indicate) that Keller pushed for such improvements at Intel as well.

Kaluan · Sep 20, 2022

Hmmm

https://twitter.com/x/status/1572041271853879296

Not even sure how to interpret this.

Kocicak · Sep 20, 2022

moinmoin said:
Simulation can't help debug issues on the silicon itself.

What do you mean? The silicone is like it has been designed to be. No bugs can just appear there by miracle.

igor_kavinski · Sep 20, 2022

Sometimes, certain assumptions can lead to unforeseen consequences and cause deviation from expected behavior. No one can foresee every problem and simulation tools are only as good as the brains that write them.

LightningZ71 · Sep 20, 2022

Simulation tools are great for the actual digital logic design on a theoretical level. They are good, but not perfect, at simulating your intended implementation on fully known and modeled silicon implementations where you are absolutely sure about every behavior in every situation. They are often, at best, an educated guess when you are dealing with what is essentially your leading edge silicon in one of it's largest implementations. Silicon is not an exact science with very minute differences in every wafer and chip. What works on 95% of them may not work on that last 5% exactly 100% of the time. Very minor differences in the chemistry of the various layers can make unexpected changes in the timing of signals propagating along a pathway or the behavior of a specific transistor, requiring you to go back and build in additional margin at the silicon level to get your yields to where you want them to be. This is all a vast over-simplification of the process, but, simulation can only go so far, and there's a lot that doesn't get captured at the simulation level for designs that are expected to run at the bleeding edge of capability, 24/7, with effectively zero errors.

nicalandia · Sep 20, 2022

And even with a final release product with no known buggs, it is always the case that the chips built at the end of the product cycle(for example late build Zen3) perform overall better than the first release samples due to what is call "Process Maturity"

moinmoin · Sep 20, 2022

In short (already implied by several of the previous posters):
Simulation is inherently digital binary black and white, the analogue reality is inherently gray scale. There can be a lot of interdependence and interference that wasn't yet perfectly accounted for in simulations. And this is getting harder the smaller the nodes.

Back to the original topic:
More and better monitoring on the silicon itself helps both speeding up debugging such corner cases as well as optimizing the simulation (where with this approach real and simulated sensors can be matched and more closely aligned over time).

Exist50 · Sep 20, 2022

Kaluan said:
Hmmm

https://twitter.com/x/status/1572041271853879296

Not even sure how to interpret this.

Interpret that as the umpteenth example of why not to trust tabloids and internet "analysts".

nicalandia · Sep 20, 2022

Sisoftware Sandra just posted an AMD ThreadRipper PRO 5995WX entry on it's data base for both Native Arithmetic and Processor Multimedia.

So of course it's time for a vs comparison between the Top of the Line Xeon W9-3495X(ES, only one entry) and the currently top of the line ThreadRipper PRO(Only one entry)

Intel Xeon W9-3495X: Arithmetic Native: 1,477.63GOPS

AMD ThreadRipper PRO 5995WX: Arithmetic Native: 1,433GOPS

Intel Xeon W9-3495X: Processor Multimedia: 7,928.76Mpix/s

AMD ThreadRipper PRO 5995WX: Processor Multimedia: 6,016.41Mpix/s

Overall it's a pretty strong performance by Sapphire Rapids specially flexing it's muscle on AVX-512. And while it's the only entry for the 5995WX and there are other entries of older model(3995WX) which have higher performance. That is also the only entry for the top of the line Xeon W9...

@Hans de Vries We need your magic here...

Sources:

Details for Result ID AMD Ryzen Threadripper PRO 5995WX 64-Cores (8M 64C 128T 4.44GHz, 533MHz/1GHz IMC, 64x 512kB L2, 8x 32MB L3)

ranker.sisoftware.co.uk

Details for Result ID Genuine Intel(R) CPU 0000%@ (56C 112T 2.8GHz, 2.5GHz IMC, 56x 2MB L2, 105MB L3)

ranker.sisoftware.co.uk

Doug S · Sep 20, 2022

LightningZ71 said:
Simulation tools are great for the actual digital logic design on a theoretical level. They are good, but not perfect, at simulating your intended implementation on fully known and modeled silicon implementations where you are absolutely sure about every behavior in every situation. They are often, at best, an educated guess when you are dealing with what is essentially your leading edge silicon in one of it's largest implementations. Silicon is not an exact science with very minute differences in every wafer and chip. What works on 95% of them may not work on that last 5% exactly 100% of the time. Very minor differences in the chemistry of the various layers can make unexpected changes in the timing of signals propagating along a pathway or the behavior of a specific transistor, requiring you to go back and build in additional margin at the silicon level to get your yields to where you want them to be. This is all a vast over-simplification of the process, but, simulation can only go so far, and there's a lot that doesn't get captured at the simulation level for designs that are expected to run at the bleeding edge of capability, 24/7, with effectively zero errors.

I remember some years ago reading about some new chip design (I can't remember the details or even if it was x86 or RISC) where they had successfully booted the OS on it prior to tape-out. They were pretty proud of that accomplishment, and it seemed to be as much about having a simulator capable of that level of performance as much as the successful boot.

What you're talking about here with that "last 5%" is process variations that aren't from defects as such (i.e. it isn't a situation where a core can't pass validation) but you get a core that can't operate at the target frequency. As I understand it, the simulators can handle timing closure and insuring there's enough slack between stages to handle the types of issues you describe gracefully. They'd be able to flag e.g. pipeline stages in a given block as a potential timing issue so designers can make changes to address it.

Different companies will handle timing closure differently. If you are Intel or AMD and able to bin everything to the nth degree, you can be pretty aggressive with timing since that gives you faster bins to sell but you also have bins for the parts that have issues you describe "unexpected changes in the timing of signals propagating along a pathway". They can either be sold at the low end or power is adjusted and they're binned at a higher TDP to achieve a desired frequency. Apple would be forced to have more timing slack since their frequency binning is pass/fail, and parts that can't operate at the target frequency and power are scrapped.

LightningZ71 · Sep 20, 2022

Like I said, a vast oversimplification. I remember, back in my days in college for my Computer Engineering degree, using Verilog to design relatively simple processors for various projects or even just for fun (because I had a warped definition of fun back then) and booting an operating system on the simulation. Yes, my own operating systems were quite simple "proof of function" things. Others were just standard implementations of old 8 -bit OSes from the past. Running an OS in a simulation environment isn't something astounding. Oh, you wouldn't expect anything approaching hardware level performance, but, with a fast enough system, you could prove it works well enough.

Even after all that, with the resources that Intel SHOULD have at their disposal, it seems odd to me that it would take them this many hardware spins of the project to get it production level. Something isn't quite right here in my view. They are likely pushing the edge really hard somewhere and it's biting them in the rear.

Thala · Sep 20, 2022

moinmoin said:
In short (already implied by several of the previous posters):
Simulation is inherently digital binary black and white, the analogue reality is inherently gray scale. There can be a lot of interdependence and interference that wasn't yet perfectly accounted for in simulations. And this is getting harder the smaller the nodes.

This statement does not seem coming from experience. The vast majority of issues, which lead to additional steppings could have been found by RTL simulation. As I explained in my previous post, it is largely a coverage problem not an inherent problem of the simulation of digital circuits.

Back to the original topic:
More and better monitoring on the silicon itself helps both speeding up debugging such corner cases as well as optimizing the simulation (where with this approach real and simulated sensors can be matched and more closely aligned over time).

I have yet to see someone of our verification engineers did ever simulate a sensor - what you even expecting a simulated sensor could help with? Even our thermal/activity simulations just simulate the heat distribution over the die area and might give you a few hints where to place the sensors - but the sensors themself are never simulated.

Doug S · Sep 20, 2022

LightningZ71 said:
Like I said, a vast oversimplification. I remember, back in my days in college for my Computer Engineering degree, using Verilog to design relatively simple processors for various projects or even just for fun (because I had a warped definition of fun back then) and booting an operating system on the simulation. Yes, my own operating systems were quite simple "proof of function" things. Others were just standard implementations of old 8 -bit OSes from the past. Running an OS in a simulation environment isn't something astounding. Oh, you wouldn't expect anything approaching hardware level performance, but, with a fast enough system, you could prove it works well enough.

Even after all that, with the resources that Intel SHOULD have at their disposal, it seems odd to me that it would take them this many hardware spins of the project to get it production level. Something isn't quite right here in my view. They are likely pushing the edge really hard somewhere and it's biting them in the rear.

You obviously have some pretty direct experience in this arena so I'll defer to you, but I imagine simulating a modern 64 bit CPU (even with shortcuts for massive but highly regular structures like cache where circuit level simulation wouldn't be necessary) booting a bloated modern OS like Windows or worse something like HP-UX or AIX (which in my not so recent experience would best case require several minutes to boot on the highest end hardware of the day not counting RAM checks which the simulated OS would skip) is a totally different animal.

I agree that Intel shouldn't need this many spins with the simulation tools available to them, which means the problems go deeper. So even if they defy all odds and have 20A available at or before the time N2 reaches mass production that doesn't mean they'll be able to deliver many CPUs made with that process - especially the high dollar server/workstation CPUs. It is very strange how incompetent they've become since the mid 2010s.

nicalandia · Sep 20, 2022

As someone asked me on Twitter. Why would Intel find many bugs on Sapphire Rapids if Intel had no issue with ADL and now with RTL.

I believe that the issues/bugs are not at the Core/Logic/SDRAM parts of the CPU, but at the whol SOC system(Compute Tiles, Mesh Interconnect, HBM, UPI Links). I am not a CPU engineer but trying to simulate such complex SOC could be more complex than simulating a simple X86_64 CPU.

nicalandia · Sep 20, 2022

nicalandia said:
Sisoftware Sandra just posted an AMD ThreadRipper PRO 5995WX entry on it's data base for both Native Arithmetic and Processor Multimedia.

So of course it's time for a vs comparison between the Top of the Line Xeon W9-3495X(ES, only one entry) and the currently top of the line ThreadRipper PRO(Only one entry)

So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw

Markfw · Sep 20, 2022

nicalandia said:
So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw

What I saw was what appeared to be a representative post of a benchmark that showed Sapphire rapids in a decent light. Nobody can argue facts, when there is so little information on SR. And it was NOT comparing against Milan or Genoa, but (soon to be) one gen back workstation. I know that the cores are pretty strong, and at 2.5 ghz, probably not sucking power like crazy. Too bad they did not do something like that with ADL.

nicalandia · Sep 20, 2022

Markfw said:
Nobody can argue facts, when there is so little information on SR. And it was NOT comparing against Milan or Genoa, but (soon to be) one gen back workstation.

I am not sure when Intel will be releasing Workstation W5,W7,W9 Sapphire Rapids-X, but by release date it will be compared directly with Zen3 ThreadRipper PRO. Which was released to the DIY market a month ago. So it will be with us for at least til September-October 2023.

Exist50 · Sep 20, 2022

nicalandia said:
So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw

There's no need to kick the hornet's nest...

Markfw · Sep 20, 2022

Exist50 said:
There's no need to kick the hornet's nest...

If we had a retail SR vs a retail Genoa (out close to the same time ???) by Phoronix or somebody like that , that tests server chips, we could discuss the results. But the above test is a yawner, interesting, but not enough information to argue or discuss.

I am not a one MFG supporter. Its whoever is best at what they are doing (as in desktop, HEDT, server, laptop). Until ADL, there was no competitive product in any area, except maybe mobile.

NOW we can discuss ADL, and soon Zen 4, and probably soon after that Raptor lake. Not sure when the server world will be competing again, Intel themselves said they would be losing server market share for a while.

Its only a hornets nest when people refuse to admit the truth.

Exist50 · Sep 20, 2022

Markfw said:
Its only a hornets nest when people refuse to admit the truth.

Something we can agree on, though probably not in the same way.

nicalandia · Sep 20, 2022

Markfw said:
If we had a retail SR vs a retail Genoa (out close to the same time ???) by Phoronix or somebody like that , that tests server chips, we could discuss the results. But the above test is a yawner, interesting, but not enough information to argue or discuss.

This is the thing Mark. These are not Server chips, Sapphire Rapids-X SKUs will be competitive with ThreadRipper Pro SKUs. Because AMD has taken longer to update their product line so now the TR PRO line is nearly a generation behind to desktop. Which is only benefiting Intel when they release their Workstation SKUs.

Will AMD Release Zen4 base ThreadRipper PRO soon after Intel Releases their W9 line? That is we need to wait and see.

Markfw · Sep 20, 2022

nicalandia said:
This is the thing Mark. These are not Server chips, Sapphire Rapids-X SKUs will be competitive with ThreadRipper Pro SKUs. Because AMD has taken longer to update their product line so now the TR PRO line is nearly a generation behind to desktop. Which is only benefiting Intel when they release their Workstation SKUs.

Will AMD Release Zen4 base ThreadRipper PRO soon after Intel Releases their W9 line? That is we need to wait and see.

Sorry, not keepning up with all the skus, and no professional reviews to reference.

nicalandia · Sep 20, 2022

Markfw said:
Sorry, not keepning up with all the skus, and no professional reviews to reference.

That will likely be done by Puget Systems, Serve The Home and Phoronix when available of course.

moinmoin · Sep 20, 2022

Thala said:
The vast majority of issues, which lead to additional steppings could have been found by RTL simulation. As I explained in my previous post, it is largely a coverage problem not an inherent problem of the simulation of digital circuits.

That's putting Intel in a worse light though. If I understood you correctly you are essentially claiming SPR is more complex than Intel was able to cover with its simulation, and took on the cost of additional steppings instead.

Thala said:
I have yet to see someone of our verification engineers did ever simulate a sensor - what you even expecting a simulated sensor could help with? Even our thermal/activity simulations just simulate the heat distribution over the die area and might give you a few hints where to place the sensors - but the sensors themself are never simulated.

That seems a little too nitpicking to me. You don't have to simulate the sensor itself, but of course you'd need to simulate whatever the sensor is measuring at the exact place it's located to be comparable (especially with the prevalent issue of hotspots). And that's not only for heat distribution.

Discussion Intel current and future Lakes & Rapids thread

Golden Member

Diamond Member

Senior member

Golden Member

Lifer

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Platinum Member

Golden Member

Diamond Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Platinum Member

Moderator Emeritus, Elite Member

Platinum Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Diamond Member