What is known about how Intel tests their CPU's?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
Ahh, I see. My misunderstanding, thanks for clearing it up.

How would the errata fit in with this?

Errata are nearly always functional bugs, and thus far the discussion has been on electrical bugs. A functional bug is a logic error - something in the chip was implemented in such a way that it doesn't match the architectural specification. An infamous example was the Intel Pentium floating-point issue (you divided two floating point numbers and it always gave the wrong answer). An electrical bug is one where the circuitry is implemented in such a way that the chip doesn't work across it's full operating "window" of voltage, power, thermal, or process. I'm sure there's a better academic way to explain the difference between electrical and functional bugs but the big difference to me is that with an electrical issue the chip gives the right answer at some mix of voltage/temperature/frequency, and with a functional issue the chip always gives the wrong answer.

Electrical and functional debug and testing are very different. I'm not sure if all companies are like this, but at Intel the teams work almost completely separately (except to toss a bug back and forth). So the goal with electrical debug is to find, debug, fix and come up with tests to make sure that the chips can run within the electrical specification - if the specification says the chip can run down as low as 0.9V, electrical guys will check that it does and if it doesn't, they debug the issue and issue a fix and usually find a test to make sure that we check for in the futuree. With functional debug, the goal is to find, debug and fix issues where the chip doesn't match the architectural specification (which is located here if you are curious).

So they check that when you do assembly instructions they behave as they are supposed to. I personally think that functional validation and debug are a lot harder than electrical - not sure what the functional guys think about electrical although usually they seem mystified by it so maybe the feeling is mutual. So with functional validation you want to find all the corner cases in the design - what happens when various buffers overflow, what happens when some funky cache situation arises where one core has a cache line and two others want it and it goes on and on. The goal of functional validation is to find, debug and fix all logical/functional bugs prior to launch.

Errata come up when either team messes up prior to product launch and we release a product that has customer visible bugs. Generally errata are functional bugs. Errata are an unfortunate effect of very complex designs and limited schedule to find bugs. Although that said, as an engineer involved in the effort I'm actually always amazed that the chips come out as functionally and electrically clean as they actually do.
 
Last edited:

Dufus

Senior member
Sep 20, 2010
675
119
101
I hope I'm not being OT by discussing the functional side of testing, let me know if I am and I'll be quiet. I was curious as to whether most of the errata were found pre mass production but not considered a show stopper or found later on. It sounds as though from what you have said that they come after release. For instance the FDIV bug you mentioned, AFAIK that was discovered by a 3rd party. I even managed to discover a bug myself by accident although I wouldn't be surprised if Intel already knew about it but decided not to document it in the specification update.
 

TuxDave

Lifer
Oct 8, 2002
10,571
3
71
I hope I'm not being OT by discussing the functional side of testing, let me know if I am and I'll be quiet. I was curious as to whether most of the errata were found pre mass production but not considered a show stopper or found later on. It sounds as though from what you have said that they come after release. For instance the FDIV bug you mentioned, AFAIK that was discovered by a 3rd party. I even managed to discover a bug myself by accident although I wouldn't be surprised if Intel already knew about it but decided not to document it in the specification update.

Not sure about most but every so often we hit one of those:
"What's this? It's not doing what it should be doing? It'll take a complete redesign to make it do THAT! Who uses it under that condition? No one? Ok... errata"
 

veri745

Golden Member
Oct 11, 2007
1,163
4
81
Errata are nearly always functional bugs, and thus far the discussion has been on electrical bugs. A functional bug is a logic error - something in the chip was implemented in such a way that it doesn't match the architectural specification. An infamous example was the Intel Pentium floating-point issue (you divided two floating point numbers and it always gave the wrong answer). An electrical bug is one where the circuitry is implemented in such a way that the chip doesn't work across it's full operating "window" of voltage, power, thermal, or process. I'm sure there's a better academic way to explain the difference between electrical and functional bugs but the big difference to me is that with an electrical issue the chip gives the right answer at some mix of voltage/temperature/frequency, and with a functional issue the chip always gives the wrong answer.

Electrical and functional debug and testing are very different. I'm not sure if all companies are like this, but at Intel the teams work almost completely separately (except to toss a bug back and forth). So the goal with electrical debug is to find, debug, fix and come up with tests to make sure that the chips can run within the electrical specification - if the specification says the chip can run down as low as 0.9V, electrical guys will check that it does and if it doesn't, they debug the issue and issue a fix and usually find a test to make sure that we check for in the futuree. With functional debug, the goal is to find, debug and fix issues where the chip doesn't match the architectural specification (which is located here if you are curious).

So they check that when you do assembly instructions they behave as they are supposed to. I personally think that functional validation and debug are a lot harder than electrical - not sure what the functional guys think about electrical although usually they seem mystified by it so maybe the feeling is mutual. So with functional validation you want to find all the corner cases in the design - what happens when various buffers overflow, what happens when some funky cache situation arises where one core has a cache line and two others want it and it goes on and on. The goal of functional validation is to find, debug and fix all logical/functional bugs prior to launch.

Errata come up when either team messes up prior to product launch and we release a product that has customer visible bugs. Generally errata are functional bugs. Errata are an unfortunate effect of very complex designs and limited schedule to find bugs. Although that said, as an engineer involved in the effort I'm actually always amazed that the chips come out as functionally and electrically clean as they actually do.

Are you using "electrical bug" to mean manufacturing defectivity, or as a systemic issue with the design and manufacture of the part (i.e. all parts exhibit this "electrical bug")
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Are you using "electrical bug" to mean manufacturing defectivity, or as a systemic issue with the design and manufacture of the part (i.e. all parts exhibit this "electrical bug")

Both.

Electrical debug includes yield issues, typically caused by layout vulnerable to poor consistency in manufacturing, that can be considered a manufacturing defect or weak spot.

Another bucket is the failure to run within the specified operation windows of speed, voltage and temperature. Depending on the issue it can be considered a systemic design problem because electrical issues can affect a handful of samples to every part DOA (worst case scenario).
 
Last edited: