Discussion Intel current and future Lakes & Rapids thread

Page 701 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Jul 27, 2020
16,155
10,234
106
I dont know much about CPU development, but I thought that there are some simulation tools available to avoid redoing the silicone again and again.
The problem seems to be that such simulation tools for a new manufacturing process are constantly in flux, as new and new data comes in about the various characteristics of the new process.

With each new stepping, they are probably updating the simulation tools in tandem, to prevent the same bugs from appearing in future silicon.
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
Perhaps the goal is not a finished processor, but the development process itself. To keep people employed and entertained.

That reminds me Hunger wall of Prague, a project believed to be built just to give work and food to the poor.
 
Last edited:

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
Simulation can't help debug issues on the silicon itself.

I don't know about Intel's current state in that regard, but Keller's most published push for change within AMD was for more real time monitoring of the chip from within, through the addition of hundreds to thousands of sensors as part of the Scalable Control Fabric part of Infinity Fabric. While these are said to be used for the chip to e.g. optimize for power usage and adapt to degradation during use, they obviously are also very helpful during development and binning.

I'd expect if Intel was lacking in that area before (as SPR seems to indicate) that Keller pushed for such improvements at Intel as well.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Simulation tools are great for the actual digital logic design on a theoretical level. They are good, but not perfect, at simulating your intended implementation on fully known and modeled silicon implementations where you are absolutely sure about every behavior in every situation. They are often, at best, an educated guess when you are dealing with what is essentially your leading edge silicon in one of it's largest implementations. Silicon is not an exact science with very minute differences in every wafer and chip. What works on 95% of them may not work on that last 5% exactly 100% of the time. Very minor differences in the chemistry of the various layers can make unexpected changes in the timing of signals propagating along a pathway or the behavior of a specific transistor, requiring you to go back and build in additional margin at the silicon level to get your yields to where you want them to be. This is all a vast over-simplification of the process, but, simulation can only go so far, and there's a lot that doesn't get captured at the simulation level for designs that are expected to run at the bleeding edge of capability, 24/7, with effectively zero errors.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
And even with a final release product with no known buggs, it is always the case that the chips built at the end of the product cycle(for example late build Zen3) perform overall better than the first release samples due to what is call "Process Maturity"
 
  • Like
Reactions: Tlh97 and Vattila

moinmoin

Diamond Member
Jun 1, 2017
4,944
7,656
136
In short (already implied by several of the previous posters):
Simulation is inherently digital binary black and white, the analogue reality is inherently gray scale. There can be a lot of interdependence and interference that wasn't yet perfectly accounted for in simulations. And this is getting harder the smaller the nodes.

Back to the original topic:
More and better monitoring on the silicon itself helps both speeding up debugging such corner cases as well as optimizing the simulation (where with this approach real and simulated sensors can be matched and more closely aligned over time).
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Sisoftware Sandra just posted an AMD ThreadRipper PRO 5995WX entry on it's data base for both Native Arithmetic and Processor Multimedia.

So of course it's time for a vs comparison between the Top of the Line Xeon W9-3495X(ES, only one entry) and the currently top of the line ThreadRipper PRO(Only one entry)


Intel Xeon W9-3495X: Arithmetic Native: 1,477.63GOPS
1663686373702.png

AMD ThreadRipper PRO 5995WX: Arithmetic Native: 1,433GOPS
1663686850992.png


Intel Xeon W9-3495X: Processor Multimedia: 7,928.76Mpix/s
1663686929806.png


AMD ThreadRipper PRO 5995WX: Processor Multimedia: 6,016.41Mpix/s

1663687149371.png


Overall it's a pretty strong performance by Sapphire Rapids specially flexing it's muscle on AVX-512. And while it's the only entry for the 5995WX and there are other entries of older model(3995WX) which have higher performance. That is also the only entry for the top of the line Xeon W9...

@Hans de Vries We need your magic here...

Sources:

 
Last edited:
  • Like
Reactions: lightmanek

Doug S

Platinum Member
Feb 8, 2020
2,251
3,481
136
Simulation tools are great for the actual digital logic design on a theoretical level. They are good, but not perfect, at simulating your intended implementation on fully known and modeled silicon implementations where you are absolutely sure about every behavior in every situation. They are often, at best, an educated guess when you are dealing with what is essentially your leading edge silicon in one of it's largest implementations. Silicon is not an exact science with very minute differences in every wafer and chip. What works on 95% of them may not work on that last 5% exactly 100% of the time. Very minor differences in the chemistry of the various layers can make unexpected changes in the timing of signals propagating along a pathway or the behavior of a specific transistor, requiring you to go back and build in additional margin at the silicon level to get your yields to where you want them to be. This is all a vast over-simplification of the process, but, simulation can only go so far, and there's a lot that doesn't get captured at the simulation level for designs that are expected to run at the bleeding edge of capability, 24/7, with effectively zero errors.


I remember some years ago reading about some new chip design (I can't remember the details or even if it was x86 or RISC) where they had successfully booted the OS on it prior to tape-out. They were pretty proud of that accomplishment, and it seemed to be as much about having a simulator capable of that level of performance as much as the successful boot.

What you're talking about here with that "last 5%" is process variations that aren't from defects as such (i.e. it isn't a situation where a core can't pass validation) but you get a core that can't operate at the target frequency. As I understand it, the simulators can handle timing closure and insuring there's enough slack between stages to handle the types of issues you describe gracefully. They'd be able to flag e.g. pipeline stages in a given block as a potential timing issue so designers can make changes to address it.

Different companies will handle timing closure differently. If you are Intel or AMD and able to bin everything to the nth degree, you can be pretty aggressive with timing since that gives you faster bins to sell but you also have bins for the parts that have issues you describe "unexpected changes in the timing of signals propagating along a pathway". They can either be sold at the low end or power is adjusted and they're binned at a higher TDP to achieve a desired frequency. Apple would be forced to have more timing slack since their frequency binning is pass/fail, and parts that can't operate at the target frequency and power are scrapped.
 
  • Like
Reactions: Tlh97 and Vattila

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
Like I said, a vast oversimplification. I remember, back in my days in college for my Computer Engineering degree, using Verilog to design relatively simple processors for various projects or even just for fun (because I had a warped definition of fun back then) and booting an operating system on the simulation. Yes, my own operating systems were quite simple "proof of function" things. Others were just standard implementations of old 8 -bit OSes from the past. Running an OS in a simulation environment isn't something astounding. Oh, you wouldn't expect anything approaching hardware level performance, but, with a fast enough system, you could prove it works well enough.

Even after all that, with the resources that Intel SHOULD have at their disposal, it seems odd to me that it would take them this many hardware spins of the project to get it production level. Something isn't quite right here in my view. They are likely pushing the edge really hard somewhere and it's biting them in the rear.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
In short (already implied by several of the previous posters):
Simulation is inherently digital binary black and white, the analogue reality is inherently gray scale. There can be a lot of interdependence and interference that wasn't yet perfectly accounted for in simulations. And this is getting harder the smaller the nodes.

This statement does not seem coming from experience. The vast majority of issues, which lead to additional steppings could have been found by RTL simulation. As I explained in my previous post, it is largely a coverage problem not an inherent problem of the simulation of digital circuits.

Back to the original topic:
More and better monitoring on the silicon itself helps both speeding up debugging such corner cases as well as optimizing the simulation (where with this approach real and simulated sensors can be matched and more closely aligned over time).

I have yet to see someone of our verification engineers did ever simulate a sensor - what you even expecting a simulated sensor could help with? Even our thermal/activity simulations just simulate the heat distribution over the die area and might give you a few hints where to place the sensors - but the sensors themself are never simulated.
 
Last edited:

Doug S

Platinum Member
Feb 8, 2020
2,251
3,481
136
Like I said, a vast oversimplification. I remember, back in my days in college for my Computer Engineering degree, using Verilog to design relatively simple processors for various projects or even just for fun (because I had a warped definition of fun back then) and booting an operating system on the simulation. Yes, my own operating systems were quite simple "proof of function" things. Others were just standard implementations of old 8 -bit OSes from the past. Running an OS in a simulation environment isn't something astounding. Oh, you wouldn't expect anything approaching hardware level performance, but, with a fast enough system, you could prove it works well enough.

Even after all that, with the resources that Intel SHOULD have at their disposal, it seems odd to me that it would take them this many hardware spins of the project to get it production level. Something isn't quite right here in my view. They are likely pushing the edge really hard somewhere and it's biting them in the rear.


You obviously have some pretty direct experience in this arena so I'll defer to you, but I imagine simulating a modern 64 bit CPU (even with shortcuts for massive but highly regular structures like cache where circuit level simulation wouldn't be necessary) booting a bloated modern OS like Windows or worse something like HP-UX or AIX (which in my not so recent experience would best case require several minutes to boot on the highest end hardware of the day not counting RAM checks which the simulated OS would skip) is a totally different animal.

I agree that Intel shouldn't need this many spins with the simulation tools available to them, which means the problems go deeper. So even if they defy all odds and have 20A available at or before the time N2 reaches mass production that doesn't mean they'll be able to deliver many CPUs made with that process - especially the high dollar server/workstation CPUs. It is very strange how incompetent they've become since the mid 2010s.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
As someone asked me on Twitter. Why would Intel find many bugs on Sapphire Rapids if Intel had no issue with ADL and now with RTL.

I believe that the issues/bugs are not at the Core/Logic/SDRAM parts of the CPU, but at the whol SOC system(Compute Tiles, Mesh Interconnect, HBM, UPI Links). I am not a CPU engineer but trying to simulate such complex SOC could be more complex than simulating a simple X86_64 CPU.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Sisoftware Sandra just posted an AMD ThreadRipper PRO 5995WX entry on it's data base for both Native Arithmetic and Processor Multimedia.

So of course it's time for a vs comparison between the Top of the Line Xeon W9-3495X(ES, only one entry) and the currently top of the line ThreadRipper PRO(Only one entry)
So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,541
14,495
136
So I made a Intel vs AMD Post and no one bats an eye? or lose their mind? What's going on here?

Where is @Hans Gruber , where is @Markfw
What I saw was what appeared to be a representative post of a benchmark that showed Sapphire rapids in a decent light. Nobody can argue facts, when there is so little information on SR. And it was NOT comparing against Milan or Genoa, but (soon to be) one gen back workstation. I know that the cores are pretty strong, and at 2.5 ghz, probably not sucking power like crazy. Too bad they did not do something like that with ADL.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Nobody can argue facts, when there is so little information on SR. And it was NOT comparing against Milan or Genoa, but (soon to be) one gen back workstation.
I am not sure when Intel will be releasing Workstation W5,W7,W9 Sapphire Rapids-X, but by release date it will be compared directly with Zen3 ThreadRipper PRO. Which was released to the DIY market a month ago. So it will be with us for at least til September-October 2023.
 
Last edited:
  • Like
Reactions: Tlh97 and ftt

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,541
14,495
136
There's no need to kick the hornet's nest...
If we had a retail SR vs a retail Genoa (out close to the same time ???) by Phoronix or somebody like that , that tests server chips, we could discuss the results. But the above test is a yawner, interesting, but not enough information to argue or discuss.

I am not a one MFG supporter. Its whoever is best at what they are doing (as in desktop, HEDT, server, laptop). Until ADL, there was no competitive product in any area, except maybe mobile.

NOW we can discuss ADL, and soon Zen 4, and probably soon after that Raptor lake. Not sure when the server world will be competing again, Intel themselves said they would be losing server market share for a while.

Its only a hornets nest when people refuse to admit the truth.
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
If we had a retail SR vs a retail Genoa (out close to the same time ???) by Phoronix or somebody like that , that tests server chips, we could discuss the results. But the above test is a yawner, interesting, but not enough information to argue or discuss.
This is the thing Mark. These are not Server chips, Sapphire Rapids-X SKUs will be competitive with ThreadRipper Pro SKUs. Because AMD has taken longer to update their product line so now the TR PRO line is nearly a generation behind to desktop. Which is only benefiting Intel when they release their Workstation SKUs.

Will AMD Release Zen4 base ThreadRipper PRO soon after Intel Releases their W9 line? That is we need to wait and see.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,541
14,495
136
This is the thing Mark. These are not Server chips, Sapphire Rapids-X SKUs will be competitive with ThreadRipper Pro SKUs. Because AMD has taken longer to update their product line so now the TR PRO line is nearly a generation behind to desktop. Which is only benefiting Intel when they release their Workstation SKUs.

Will AMD Release Zen4 base ThreadRipper PRO soon after Intel Releases their W9 line? That is we need to wait and see.
Sorry, not keepning up with all the skus, and no professional reviews to reference.