Discussion Intel current and future Lakes & Rapids thread

Page 702 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
The vast majority of issues, which lead to additional steppings could have been found by RTL simulation. As I explained in my previous post, it is largely a coverage problem not an inherent problem of the simulation of digital circuits.
That's putting Intel in a worse light though. If I understood you correctly you are essentially claiming SPR is more complex than Intel was able to cover with its simulation, and took on the cost of additional steppings instead.

I have yet to see someone of our verification engineers did ever simulate a sensor - what you even expecting a simulated sensor could help with? Even our thermal/activity simulations just simulate the heat distribution over the die area and might give you a few hints where to place the sensors - but the sensors themself are never simulated.
That seems a little too nitpicking to me. You don't have to simulate the sensor itself, but of course you'd need to simulate whatever the sensor is measuring at the exact place it's located to be comparable (especially with the prevalent issue of hotspots). And that's not only for heat distribution.
 

Exist50

Platinum Member
Aug 18, 2016
2,445
3,043
136
I dont know much about CPU development, but I thought that there are some simulation tools available to avoid redoing the silicone again and again.

Where does the information, that it is still not ready, come from? Would it be possible to get some intel about what is currently wrong with it?

Learning about the past problems and the process of fixing it would be most interesting to many I believe. Intel should publish some study about it to improve general knowledge of processor development.
So, a few comments on how hardware validation is done. First of all, the tools and methodologies absolutely exist to do this kind of testing, but it's not automatic. You need to create test benches, simulation environments, etc. But there's no theoretical reason that A0 silicon cannot be completely free of RTL bugs. Analog can be somewhat trickier, but an industry standard way of mitigating that is to have test chips for high risk IP (think, high speed PHYs, power delivery, etc). Those are basically a collection of key circuits and some full IPs that you have actually manufactured so you can test them post-Si without taping out the full design. You can do test chips for digital as well, and they've also very useful for process learnings.

But back to debug, there are tradeoffs to consider. Ideally, you want to catch bugs as early in the process as possible (i.e. IP or sub-IP level simulation), because that's where the cost is lowest to fix them and you have the most observability into the issue. Larger (e.g. SoC-level) simulation is quite slow, but useful for flushing out more bugs, and as you get into FPGAs and emulation, much more expensive and lower observability, but helpful for finding yet more bugs. Post-Si is a whole nother ballgame, however. A single minute in post-Si can run more full-chip cycles than all of pre-Si validation combined, so it's the most powerful tool for identifying whether a bug exists. However, you have very poor observability into where the bug is coming from, and obviously the lead time to fix it and the cost of doing so are brutal.

The story I heard was that one of Keller's big goals at Intel was rebuilding their pre-Si validation so they wouldn't be stuck in this sort of situation. But I also heard that he kind of took a sort of "damage is done" approach to SPR, and basically had them spam steppings in an attempt to get it to market as fast as possible. Supposedly the original A0 stepping was basically a glorified test chip.

Edit: Also, process issues can easily drive more steppings, even if nothing is wrong with the RTL. But that's clearly insufficient to explain the continued issues with SPR.
 

LightningZ71

Golden Member
Mar 10, 2017
1,627
1,898
136
And, on a very fundamental level, we STILL don't have a good handle on how Intel 7 is actually yielding on the various fin combinations. We can externally see that Intel has been improving yields enough to make volume on progressively larger and larger chips (ice lake on volume 10nm, tiger lake on superfin, larger tigerlake 8 core on superfin [ice lake server at low speeds with big die], still larger alder lake 8+8 on enhanced superfin, finally rocket lake 8+16 on esf/intel 7. We don't know if Intel is still having yield issues on "7" that are greater than their hedging on an extra core and redundant pathways are able to make up for. In addition, saphire rapids is arguably one of their more complex packages that they've attempted in volume. There's still a lot that can go very wrong.

Simulation only takes you so far. Eventually, you put it all together and watch the train derail in new and exciting ways, until it doesn't.
 
Jul 27, 2020
15,738
9,806
106
I believe that the issues/bugs are not at the Core/Logic/SDRAM parts of the CPU, but at the whol SOC system(Compute Tiles, Mesh Interconnect, HBM, UPI Links). I am not a CPU engineer but trying to simulate such complex SOC could be more complex than simulating a simple X86_64 CPU.
I believe they hired some ex-AMD engineer from the bulldozer days...
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
I believe they hired some ex-AMD engineer from the bulldozer days...
SoftMachines/AMD-related (Bulldozer-related - 2001-2013) engineers were dropped from Core cores/SoCs and shifted to Atom cores/SoCs. Sapphire Rapids is on the standard team stack, not the refreshed American E-core/E-SoC team stack.

They did one P-core project and it never came out. Specifically, it never got into the roadmap.
 
Last edited:

name99

Senior member
Sep 11, 2010
404
303
136
I actually wish we could do just that, fly on the wall style! Not in a malicious way, I just think post mortems of such developments anywhere are very interesting and insightful.
You can for Wolfram (company that makes Mathematica). Their design meetings are open and available on YouTube and have been for at least five years or more.
Mathematica probably has the best bugs to complexity ratio of any software on earth (at least that I know of; maybe avionics is better – but avionics may also not be *that* complex a problem, and doesn't have as long a tail of backward compatibility?)
 

nicalandia

Diamond Member
Jan 10, 2019
3,330
5,281
136
Intel GeTi

1664296365123.png

I don't care about IA, can we get to the info on Sapphire Rapids, Raptor Lake and other Hardware?
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Pat just called it the i3-13900k. Whoops. Hopefully they provide some real info with the announcement.