how is cpu speed binning done?

OS

Lifer
Oct 11, 1999
15,581
1
76

How does the manfacturer test cpus to know what speed they are appropriate for?

 

Bovinicus

Diamond Member
Aug 8, 2001
3,145
0
0
It really depends on the company and which CPU you are talking about. The validation process for server CPUs is more rigirous than with dekstop CPUs. Also, sometimes if yields are doing really good then they clock cores at slower speeds than they are actually capable of running (How overclockers can do what they do).
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
As Bovinicus mentioned, it depends.

In general though and simplifying the process a bit, the CPU is loaded into a tester (such as the multi-million dollar Schlumberger ITS 9000 functional tester) that contains all of the worst-case code that that company can come up with (these are called "tests" or "patterns"). The CPU is loaded into the tester and run at a variety of voltages, frequencies, and temperatures. The passing and failing code status and temperature/voltage/frequency daya creates a 2D array of data that is then compared against the specifications set for that product and the bin is determined by comparing the two. This probably the most traditional approach. There are other methods that use internal test technology designed within the chip to work or a hybrid approach
 

OS

Lifer
Oct 11, 1999
15,581
1
76
Originally posted by: pm
As Bovinicus mentioned, it depends.

In general though and simplifying the process a bit, the CPU is loaded into a tester (such as the multi-million dollar Schlumberger ITS 9000 functional tester) that contains all of the worst-case code that that company can come up with (these are called "tests" or "patterns"). The CPU is loaded into the tester and run at a variety of voltages, frequencies, and temperatures. The passing and failing code status and temperature/voltage/frequency daya creates a 2D array of data that is then compared against the specifications set for that product and the bin is determined by comparing the two. This probably the most traditional approach. There are other methods that use internal test technology designed within the chip to work or a hybrid approach

So are the test criteria oriented for the maximum production of high speed parts, or are they oriented to meet projected demand for each speed rating?

Also, how good of a test is an overnite torture testing program like prime95, relative to the factory tester?



 

Macro2

Diamond Member
May 20, 2000
4,874
0
0
You have to remember that yields (good die per wafer) and speed binning are two different things. You can have good yields and lousy bins or even vice versa. There is something called defect density that I don't have the time to go into right no
 

OS

Lifer
Oct 11, 1999
15,581
1
76
Originally posted by: Macro2
You have to remember that yields (good die per wafer) and speed binning are two different things. You can have good yields and lousy bins or even vice versa. There is something called defect density that I don't have the time to go into right no

Yeah you're right, I mixed up some words. Should be fixed now.

 

human2k

Diamond Member
Jun 21, 2001
3,563
0
0
Originally posted by: pm
As Bovinicus mentioned, it depends.

In general though and simplifying the process a bit, the CPU is loaded into a tester (such as the multi-million dollar Schlumberger ITS 9000 functional tester) that contains all of the worst-case code that that company can come up with (these are called "tests" or "patterns").


hah...I have 3 or 4 of those Schlumberger ITS 9000 functional testers in my garage somewhere.
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
So are the test criteria oriented for the maximum production of high speed parts, or are they oriented to meet projected demand for each speed rating?
To be frank, neither. They are optimized to find defects and test the limits of the processor at various "corners" of voltage/frequency and temperature.

The issue of whether or not parts are "down-binned" comes up frequently here at AT and I can honestly say that I don't know the answer as whether or not this happens. If it does happen, it's must be more rare than people seem to think. There is no business benefit to down-binning beyond the immediate short-term needs of the channel. Frequency distribution among parts is a Guassian/normal distribution and thus it makes sense to have pricing - and thus market demand - follow this same Gaussian curve. If you have an excess of high-speed parts then it makes more sense to sock it to the competition by dropping prices on this excess high-speed capacity than it does to mark them and sell them at a lower speed - this lower speed part will be less competitve in the marketplace and yet you are not making more money from it (since you marked it as a lower speed). While I can picture rare cases where this might occur in order to maintain supply in the channel and minimize distruption of product roadmaps, it only makes business sense to me to have these events occur infrequently. Otherwise you are minimizing profits by selling a less competitive product line than you otherwise could be. This subject comes up at least once every couple of months, but I have yet to see anyone come up with a sound business reason for long-term "down-binning" by a company.

As far as why overclocking works, my explanation is simple: overclocking eats into engineering margin. Microprocessors are used in a variety of applications in a variety of environments under a variety of stresses. Thus in order to ensure that a product will meet operational specifications across a broad spectrum of operating conditions, there is margin in the product. A given CPU might be spec'd to run at a minimum voltage than is 10% less than the specification, at a temperature of, say 80C. So the way that overclocking works is that you decrease the operational envelope that this CPU will see by : for example. spending more for a good power supply, using some expensive heatsink/fan, increasing the cooling in the case. If the user can ensure that a CPU never experiences extreme conditions, then is reducing the operational envelope and can thus increase the frequency. The most common overclocking technique, however, is to increase the voltage. This eats into the longevity of the CPU, however, and there is an exponential decreasing in operating lifetime with each increase voltage. There is a sound reason why manufacturers aren't increasing the voltage on the CPU's that they sell - if they did they'd be able to sell higher performing parts... but statistically they would not last as long. The voltage that a CPU ships at has been carefully engineered to maximize performance within a given operating lifetime.

My reply to the question on hypothetical downbining is how much success someone would have overclocking a CPU using a lousy power supply, at stock voltage, using typical RAM, on a poorly designed motherboard, in a case that doesn't not have very good ventilation, on a hot summers day running the code that maximally stresses the CPU? Most people would agree that any potential overclocking gains would be pretty small. Which leads back to my point about eating into engineering margin.
Also, how good of a test is an overnite torture testing program like prime95, relative to the factory tester?
Prime95 is not actually a bad test because it maintains a high level of CPU activity thus increasing the processor temperature. The testers, however, are capable of stressing the CPU much more and the tests are optimized to try and run the code that stresses specific circuitry that was known during the design phase to be the chip limiter on frequency. Still, Prime95 and some of the video games out there (like the fly-by in Unreal Tournament, and 3DMark) are pretty good tests.
 

OS

Lifer
Oct 11, 1999
15,581
1
76


Wow that was a really good write up Pat, thanks! That should be in the FAQ in it isn't already.





 

Jhhnn

IN MEMORIAM
Nov 11, 1999
62,365
14,684
136
I suspect that a fair amount of down-binning occurs just because of the interplay of various forces within a production operation. Somewhere between the production and marketing execs a goal is set- make so many of these, so many of those, etc, to meet the expected demands and promises to the big OEM's. There's always some lag in communication between production capability and those who set the quotas. With a new process, it can be difficult to meet the quotas. But as the process improves over time, the quotas have a tendency to lag behind. As a production manager or worker, the best case scenario is to get the difficult part out of the way early on, coast on through turning out the easy stuff. It really doesn't matter that the quality curve of the production run is exceeding expectations, what matters is that those expectations are met. Rendering a 2.4gig core into a 2gig celeron is no problemo when you're ahead of the curve and the stuff goes out the door as fast as you can make it. They do the same kind of stuff at a sawmill, or any high production operation. The boss will carry the message on up the chain of command that things are going better than expected, and maybe the big guys will want more or better next week. Meanwhile, the job gets done on time and on budget, everybody goes home happy.

The production guys just try to dance to the tune set by the marketing guys, and try not to make any promises they can't deliver on ....
 

mrzed

Senior member
Jan 29, 2001
811
0
0
Thanks also for the write up PM. Very nicely explained.

I can think of one possible situation where downbinning might be a business decision. Towards the end of a product cycle when the process has matured. For example a lot of people are overclocking tualatin celerons while simultaneously undervolting them. Perhaps this means that the process is so mature at this point that there are simply not enough low speed chips to meet demand. Nice for us end-of-line upgraders though :)
 

sharkeeper

Lifer
Jan 13, 2001
10,886
2
0
The newest Prime95 is about the harshest software based test I can find, PERIOD! A lot of O/C'd P4's that *never* crash in Windoze, endless looping of 3DMark, etc. can error out in Prime95 in a few minutes to several hours. Always random. Lowering the clock speed sometimes by as little as 50 MHz corrects this. This is why I never trust a system that is overclocked the slightest to do any calculations where accuracy is paramount.

Cheers!
 

MrDudeMan

Lifer
Jan 15, 2001
15,069
94
91
Originally posted by: sharkeeper
The newest Prime95 is about the harshest software based test I can find, PERIOD! A lot of O/C'd P4's that *never* crash in Windoze, endless looping of 3DMark, etc. can error out in Prime95 in a few minutes to several hours. Always random. Lowering the clock speed sometimes by as little as 50 MHz corrects this. This is why I never trust a system that is overclocked the slightest to do any calculations where accuracy is paramount.

Cheers!

what are you blabbing about? calculations? unless you are doing some kind of distributed computing, wtf do you need every calculation possible to be correct for? ive had my p4 1.6 @ 2.4 for the longest time and nothing has even gone wrong. i get no errors, etc. and mine isnt special by any stretch of the mark. ive had athlons do the same thing..OC like a mofo and keep it there just fine. ive never noticed glitching or any crap like that...

so what gives you the impression that you can NEVER trust an overclocked CPU other than prime95 which is ABSOLUTE WORST CASE scenario? (besides the schlumberg thingy)
 

pm

Elite Member Mobile Devices
Jan 25, 2000
7,419
22
81
what are you blabbing about? calculations? unless you are doing some kind of distributed computing, wtf do you need every calculation possible to be correct for?
A remarkably polite post, Mr. Dudeman.

I remember some time ago when the game Unreal came out a lot of people complained that it was buggy and crashed constantly, and it was buggy and did crash frequently when it first came out. Then Tim Sweeney and the gang at Epic pushed out a series of patches that fixed most of the problems and still people complained. A month or so after the game came out I recall something being sent out from Epic that basically said something like, "If you are overclocking and Unreal is crashing for you, stop overclocking before you call us to complain. Unreal is an intensive program and it puts a lot of stress on the CPU and graphics subsystem. Overclocked systems that can appear stable on other applications may crash when running Unreal." I paraphrased this since I couldn't find it on their website (everything is Unreal II now), but I'm sure I could find the real thing if I searched enough and that it would read fairly close to this. The point is that people thought that they had stable overclocked computers, Unreal came out and they found out that the frequency that they overclocked to wasn't actually stable on an application that they wanted to run.

The way that the frequency of a CPU is determined is when the slowest circuit path on the chip doesn't complete it's operation before the clock changes. The circuit won't be finished and will be reading an intermediate (incorrect) value, the clock will fire and this intermediate incorrect value will be accepted as the correct answer by the CPU resulting in system errors. If you slow the clock down, the circuit will have more time to calculate the result and the CPU will work. If the CPU's temperature increases, the circuitry will run slower and the then again the circuit might not finishing calculating before this even slower clock transistions.

The question here is, which circuit is the slowest one on the chip and what program is going to test this circuit. At the company, the designers should have a pretty good idea of all the slowest circuit paths on the chip and they write specific tests that check the speed of these paths for the multi-million dollar testers that I mentioned. But if you are a user, then you pretty much have to guess - and it's probable that you will guess wrong. If the slowest circuit on the chip is somewhere in the floating point unit, then running Prime95 will probably uncover it. Even if it's not Prime95 tends to raise the temperature on the chip to the point that other circuits in other units may fail. But even Prime95 is not the ultimate testing program - I merely said that it's not "a bad test". If the slowest circuit on the chip is in some other section - say a translation lookaside buffers (TLB) and Prime95 doesn't use the section of memory that would cause that circuit to execute then Prime95 isn't going to find that path... since it's not actually doing anything while Prime95 is running. So even if you spend days, weeks and months running Prime95, then you could still be caught with a crashing computer some hypothetical day when Unreal 5 shows up that exercises some circuit path that wasn't checked with any previous program.

All of that said, if a user wants to check for stability at a given operating point, the best place to start is to run programs that really stress the system like Prime95, 3DMark, UT2003, etc.
 

OS

Lifer
Oct 11, 1999
15,581
1
76
Originally posted by: pm

I remember some time ago when the game Unreal came out a lot of people complained that it was buggy and crashed constantly, and it was buggy and did crash frequently when it first came out. Then Tim Sweeney and the gang at Epic pushed out a series of patches that fixed most of the problems and still people complained. A month or so after the game came out I recall something being sent out from Epic that basically said something like, "If you are overclocking and Unreal is crashing for you, stop overclocking before you call us to complain. Unreal is an intensive program and it puts a lot of stress on the CPU and graphics subsystem. Overclocked systems that can appear stable on other applications may crash when running Unreal." I paraphrased this since I couldn't find it on their website (everything is Unreal II now), but I'm sure I could find the real thing if I searched enough and that it would read fairly close to this. The point is that people thought that they had stable overclocked computers, Unreal came out and they found out that the frequency that they overclocked to wasn't actually stable on an application that they wanted to run.

I can vouch for that, I remember that also. Unreal was a significant step up in the gaming world, and so was the load it put on systems back then.