Discussion Intel current and future Lakes & Rapids thread

Page 701 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

moinmoin

Diamond Member
Jun 1, 2017
3,827
5,597
136
Simulation can't help debug issues on the silicon itself.

I don't know about Intel's current state in that regard, but Keller's most published push for change within AMD was for more real time monitoring of the chip from within, through the addition of hundreds to thousands of sensors as part of the Scalable Control Fabric part of Infinity Fabric. While these are said to be used for the chip to e.g. optimize for power usage and adapt to degradation during use, they obviously are also very helpful during development and binning.

I'd expect if Intel was lacking in that area before (as SPR seems to indicate) that Keller pushed for such improvements at Intel as well.
 

LightningZ71

Golden Member
Mar 10, 2017
1,403
1,573
136
Simulation tools are great for the actual digital logic design on a theoretical level. They are good, but not perfect, at simulating your intended implementation on fully known and modeled silicon implementations where you are absolutely sure about every behavior in every situation. They are often, at best, an educated guess when you are dealing with what is essentially your leading edge silicon in one of it's largest implementations. Silicon is not an exact science with very minute differences in every wafer and chip. What works on 95% of them may not work on that last 5% exactly 100% of the time. Very minor differences in the chemistry of the various layers can make unexpected changes in the timing of signals propagating along a pathway or the behavior of a specific transistor, requiring you to go back and build in additional margin at the silicon level to get your yields to where you want them to be. This is all a vast over-simplification of the process, but, simulation can only go so far, and there's a lot that doesn't get captured at the simulation level for designs that are expected to run at the bleeding edge of capability, 24/7, with effectively zero errors.
 

nicalandia

Platinum Member
Jan 10, 2019
2,192
3,293
106
And even with a final release product with no known buggs, it is always the case that the chips built at the end of the product cycle(for example late build Zen3) perform overall better than the first release samples due to what is call "Process Maturity"
 
  • Like
Reactions: Tlh97 and Vattila

moinmoin

Diamond Member
Jun 1, 2017
3,827
5,597
136
In short (already implied by several of the previous posters):
Simulation is inherently digital binary black and white, the analogue reality is inherently gray scale. There can be a lot of interdependence and interference that wasn't yet perfectly accounted for in simulations. And this is getting harder the smaller the nodes.

Back to the original topic:
More and better monitoring on the silicon itself helps both speeding up debugging such corner cases as well as optimizing the simulation (where with this approach real and simulated sensors can be matched and more closely aligned over time).
 

nicalandia

Platinum Member
Jan 10, 2019
2,192
3,293
106
Sisoftware Sandra just posted an AMD ThreadRipper PRO 5995WX entry on it's data base for both Native Arithmetic and Processor Multimedia.

So of course it's time for a vs comparison between the Top of the Line Xeon W9-3495X(ES, only one entry) and the currently top of the line ThreadRipper PRO(Only one entry)


Intel Xeon W9-3495X: Arithmetic Native: 1,477.63GOPS
1663686373702.png

AMD ThreadRipper PRO 5995WX: Arithmetic Native: 1,433GOPS
1663686850992.png


Intel Xeon W9-3495X: Processor Multimedia: 7,928.76Mpix/s
1663686929806.png


AMD ThreadRipper PRO 5995WX: Processor Multimedia: 6,016.41Mpix/s

1663687149371.png


Overall it's a pretty strong performance by Sapphire Rapids specially flexing it's muscle on AVX-512. And while it's the only entry for the 5995WX and there are other entries of older model(3995WX) which have higher performance. That is also the only entry for the top of the line Xeon W9...

@Hans de Vries We need your magic here...

Sources:

 
Last edited:
  • Like
Reactions: lightmanek

Doug S

Golden Member
Feb 8, 2020
1,331
1,979
106
Simulation tools are great for the actual digital logic design on a theoretical level. They are good, but not perfect, at simulating your intended implementation on fully known and modeled silicon implementations where you are absolutely sure about every behavior in every situation. They are often, at best, an educated guess when you are dealing with what is essentially your leading edge silicon in one of it's largest implementations. Silicon is not an exact science with very minute differences in every wafer and chip. What works on 95% of them may not work on that last 5% exactly 100% of the time. Very minor differences in the chemistry of the various layers can make unexpected changes in the timing of signals propagating along a pathway or the behavior of a specific transistor, requiring you to go back and build in additional margin at the silicon level to get your yields to where you want them to be. This is all a vast over-simplification of the process, but, simulation can only go so far, and there's a lot that doesn't get captured at the simulation level for designs that are expected to run at the bleeding edge of capability, 24/7, with effectively zero errors.

I remember some years ago reading about some new chip design (I can't remember the details or even if it was x86 or RISC) where they had successfully booted the OS on it prior to tape-out. They were pretty proud of that accomplishment, and it seemed to be as much about having a simulator capable of that level of performance as much as the successful boot.

What you're talking about here with that "last 5%" is process variations that aren't from defects as such (i.e. it isn't a situation where a core can't pass validation) but you get a core that can't operate at the target frequency. As I understand it, the simulators can handle timing closure and insuring there's enough slack between stages to handle the types of issues you describe gracefully. They'd be able to flag e.g. pipeline stages in a given block as a potential timing issue so designers can make changes to address it.

Different companies will handle timing closure differently. If you are Intel or AMD and able to bin everything to the nth degree, you can be pretty aggressive with timing since that gives you faster bins to sell but you also have bins for the parts that have issues you describe "unexpected changes in the timing of signals propagating along a pathway". They can either be sold at the low end or power is adjusted and they're binned at a higher TDP to achieve a desired frequency. Apple would be forced to have more timing slack since their frequency binning is pass/fail, and parts that can't operate at the target frequency and power are scrapped.
 
  • Like
Reactions: Tlh97 and Vattila

LightningZ71

Golden Member
Mar 10, 2017
1,403
1,573
136
Like I said, a vast oversimplification. I remember, back in my days in college for my Computer Engineering degree, using Verilog to design relatively simple processors for various projects or even just for fun (because I had a warped definition of fun back then) and booting an operating system on the simulation. Yes, my own operating systems were quite simple "proof of function" things. Others were just standard implementations of old 8 -bit OSes from the past. Running an OS in a simulation environment isn't something astounding. Oh, you wouldn't expect anything approaching hardware level performance, but, with a fast enough system, you could prove it works well enough.

Even after all that, with the resources that Intel SHOULD have at their disposal, it seems odd to me that it would take them this many hardware spins of the project to get it production level. Something isn't quite right here in my view. They are likely pushing the edge really hard somewhere and it's biting them in the rear.
 

Thala

Golden Member
Nov 12, 2014
1,335
645
136
In short (already implied by several of the previous posters):
Simulation is inherently digital binary black and white, the analogue reality is inherently gray scale. There can be a lot of interdependence and interference that wasn't yet perfectly accounted for in simulations. And this is getting harder the smaller the nodes.
This statement does not seem coming from experience. The vast majority of issues, which lead to additional steppings could have been found by RTL simulation. As I explained in my previous post, it is largely a coverage problem not an inherent problem of the simulation of digital circuits.

Back to the original topic:
More and better monitoring on the silicon itself helps both speeding up debugging such corner cases as well as optimizing the simulation (where with this approach real and simulated sensors can be matched and more closely aligned over time).
I have yet to see someone of our verification engineers did ever simulate a sensor - what you even expecting a simulated sensor could help with? Even our thermal/activity simulations just simulate the heat distribution over the die area and might give you a few hints where to place the sensors - but the sensors themself are never simulated.
 
Last edited:

Doug S

Golden Member
Feb 8, 2020
1,331
1,979
106
Like I said, a vast oversimplification. I remember, back in my days in college for my Computer Engineering degree, using Verilog to design relatively simple processors for various projects or even just for fun (because I had a warped definition of fun back then) and booting an operating system on the simulation. Yes, my own operating systems were quite simple "proof of function" things. Others were just standard implementations of old 8 -bit OSes from the past. Running an OS in a simulation environment isn't something astounding. Oh, you wouldn't expect anything approaching hardware level performance, but, with a fast enough system, you could prove it works well enough.

Even after all that, with the resources that Intel SHOULD have at their disposal, it seems odd to me that it would take them this many hardware spins of the project to get it production level. Something isn't quite right here in my view. They are likely pushing the edge really hard somewhere and it's biting them in the rear.

You obviously have some pretty direct experience in this arena so I'll defer to you, but I imagine simulating a modern 64 bit CPU (even with shortcuts for massive but highly regular structures like cache where circuit level simulation wouldn't be necessary) booting a bloated modern OS like Windows or worse something like HP-UX or AIX (which in my not so recent experience would best case require several minutes to boot on the highest end hardware of the day not counting RAM checks which the simulated OS would skip) is a totally different animal.

I agree that Intel shouldn't need this many spins with the simulation tools available to them, which means the problems go deeper. So even if they defy all odds and have 20A available at or before the time N2 reaches mass production that doesn't mean they'll be able to deliver many CPUs made with that process - especially the high dollar server/workstation CPUs. It is very strange how incompetent they've become since the mid 2010s.
 

nicalandia

Platinum Member
Jan 10, 2019
2,192
3,293
106
As someone asked me on Twitter. Why would Intel find many bugs on Sapphire Rapids if Intel had no issue with ADL and now with RTL.

I believe that the issues/bugs are not at the Core/Logic/SDRAM parts of the CPU, but at the whol SOC system(Compute Tiles, Mesh Interconnect, HBM, UPI Links). I am not a CPU engineer but trying to simulate such complex SOC could be more complex than simulating a simple X86_64 CPU.
 

nicalandia

Platinum Member
Jan 10, 2019
2,192
3,293
106
Sisoftware Sandra just posted an AMD ThreadRipper PRO 5995WX entry on it's data base for both Native Arithmetic and Processor Multimedia.

So of course it's time for a vs comparison between the Top of the Line Xeon W9-3495X(ES, only one entry) and the currently top of the line ThreadRipper PRO(Only one entry)
So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
23,407
12,378
136
So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw
What I saw was what appeared to be a representative post of a benchmark that showed Sapphire rapids in a decent light. Nobody can argue facts, when there is so little information on SR. And it was NOT comparing against Milan or Genoa, but (soon to be) one gen back workstation. I know that the cores are pretty strong, and at 2.5 ghz, probably not sucking power like crazy. Too bad they did not do something like that with ADL.
 

nicalandia

Platinum Member
Jan 10, 2019
2,192
3,293
106
Nobody can argue facts, when there is so little information on SR. And it was NOT comparing against Milan or Genoa, but (soon to be) one gen back workstation.
I am not sure when Intel will be releasing Workstation W5,W7,W9 Sapphire Rapids-X, but by release date it will be compared directly with Zen3 ThreadRipper PRO. Which was released to the DIY market a month ago. So it will be with us for at least til September-October 2023.
 
Last edited:
  • Like
Reactions: Tlh97 and ftt

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
23,407
12,378
136
There's no need to kick the hornet's nest...
If we had a retail SR vs a retail Genoa (out close to the same time ???) by Phoronix or somebody like that , that tests server chips, we could discuss the results. But the above test is a yawner, interesting, but not enough information to argue or discuss.

I am not a one MFG supporter. Its whoever is best at what they are doing (as in desktop, HEDT, server, laptop). Until ADL, there was no competitive product in any area, except maybe mobile.

NOW we can discuss ADL, and soon Zen 4, and probably soon after that Raptor lake. Not sure when the server world will be competing again, Intel themselves said they would be losing server market share for a while.

Its only a hornets nest when people refuse to admit the truth.
 

nicalandia

Platinum Member
Jan 10, 2019
2,192
3,293
106
If we had a retail SR vs a retail Genoa (out close to the same time ???) by Phoronix or somebody like that , that tests server chips, we could discuss the results. But the above test is a yawner, interesting, but not enough information to argue or discuss.
This is the thing Mark. These are not Server chips, Sapphire Rapids-X SKUs will be competitive with ThreadRipper Pro SKUs. Because AMD has taken longer to update their product line so now the TR PRO line is nearly a generation behind to desktop. Which is only benefiting Intel when they release their Workstation SKUs.

Will AMD Release Zen4 base ThreadRipper PRO soon after Intel Releases their W9 line? That is we need to wait and see.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
23,407
12,378
136
This is the thing Mark. These are not Server chips, Sapphire Rapids-X SKUs will be competitive with ThreadRipper Pro SKUs. Because AMD has taken longer to update their product line so now the TR PRO line is nearly a generation behind to desktop. Which is only benefiting Intel when they release their Workstation SKUs.

Will AMD Release Zen4 base ThreadRipper PRO soon after Intel Releases their W9 line? That is we need to wait and see.
Sorry, not keepning up with all the skus, and no professional reviews to reference.
 

moinmoin

Diamond Member
Jun 1, 2017
3,827
5,597
136
The vast majority of issues, which lead to additional steppings could have been found by RTL simulation. As I explained in my previous post, it is largely a coverage problem not an inherent problem of the simulation of digital circuits.
That's putting Intel in a worse light though. If I understood you correctly you are essentially claiming SPR is more complex than Intel was able to cover with its simulation, and took on the cost of additional steppings instead.

I have yet to see someone of our verification engineers did ever simulate a sensor - what you even expecting a simulated sensor could help with? Even our thermal/activity simulations just simulate the heat distribution over the die area and might give you a few hints where to place the sensors - but the sensors themself are never simulated.
That seems a little too nitpicking to me. You don't have to simulate the sensor itself, but of course you'd need to simulate whatever the sensor is measuring at the exact place it's located to be comparable (especially with the prevalent issue of hotspots). And that's not only for heat distribution.
 

Exist50

Golden Member
Aug 18, 2016
1,162
1,157
136
I dont know much about CPU development, but I thought that there are some simulation tools available to avoid redoing the silicone again and again.

Where does the information, that it is still not ready, come from? Would it be possible to get some intel about what is currently wrong with it?

Learning about the past problems and the process of fixing it would be most interesting to many I believe. Intel should publish some study about it to improve general knowledge of processor development.
So, a few comments on how hardware validation is done. First of all, the tools and methodologies absolutely exist to do this kind of testing, but it's not automatic. You need to create test benches, simulation environments, etc. But there's no theoretical reason that A0 silicon cannot be completely free of RTL bugs. Analog can be somewhat trickier, but an industry standard way of mitigating that is to have test chips for high risk IP (think, high speed PHYs, power delivery, etc). Those are basically a collection of key circuits and some full IPs that you have actually manufactured so you can test them post-Si without taping out the full design. You can do test chips for digital as well, and they've also very useful for process learnings.

But back to debug, there are tradeoffs to consider. Ideally, you want to catch bugs as early in the process as possible (i.e. IP or sub-IP level simulation), because that's where the cost is lowest to fix them and you have the most observability into the issue. Larger (e.g. SoC-level) simulation is quite slow, but useful for flushing out more bugs, and as you get into FPGAs and emulation, much more expensive and lower observability, but helpful for finding yet more bugs. Post-Si is a whole nother ballgame, however. A single minute in post-Si can run more full-chip cycles than all of pre-Si validation combined, so it's the most powerful tool for identifying whether a bug exists. However, you have very poor observability into where the bug is coming from, and obviously the lead time to fix it and the cost of doing so are brutal.

The story I heard was that one of Keller's big goals at Intel was rebuilding their pre-Si validation so they wouldn't be stuck in this sort of situation. But I also heard that he kind of took a sort of "damage is done" approach to SPR, and basically had them spam steppings in an attempt to get it to market as fast as possible. Supposedly the original A0 stepping was basically a glorified test chip.

Edit: Also, process issues can easily drive more steppings, even if nothing is wrong with the RTL. But that's clearly insufficient to explain the continued issues with SPR.
 

ASK THE COMMUNITY