Why don't CPU makers have 'factory outlets', irregulars, etc.

VirtualLarry

No Lifer
Aug 25, 2001
56,570
10,196
126
You would think that Intel, at least, would take some of their quad-core dies that are partially defective, and sell them as lesser cpus. (Can you imagine a 1366 single-core or dual-core CPU?) You wonder how much money intel is losing on partially-defective dies. At least AMD does something like this, they sell dual- and triple-core CPUs made from defective quad-core dice.
 

mutz

Senior member
Jun 5, 2009
343
0
0
they did it with the 775 didn't they?

just ~a year ago releasing that 9505 ones with lower cache and maybe the E7500 can count as a binned quad?
 

aphorism

Member
Jun 26, 2010
41
0
0
manufacturing costs are cheap, particularly for in-house fabs like intel. yields are very high because that's one of their main goals with their designs. when you sell at a level of volume like intel it can make a big difference.
 

bryanW1995

Lifer
May 22, 2007
11,144
32
91
they will cheapen the brand for one thing. the 9505 was ok b/c it came out near the end of the penryn lifecycle, in fact nehalem had been out for a while when that one came out.

another reason intel doesn't do it as much is that they're not having to push their clocks that high. amd is pushing their cpu's like crazy to make up for intel's ipc advantage, causing amd to have more cpu's that can't make the grade. intel, otoh, has been able to be very conservative with their clocks and find a slot for most of their cpus without having to make up a new sku just for failed parts.
 

mutz

Senior member
Jun 5, 2009
343
0
0
another point worth mentioning is that same as with the 9505, intel would be able to sell out dual core 1366 at the near end of the 1366 life cycle, not having to allow vast option at that platform so limiting buyers to certain SKU's such as the 9xx's, making full profit out of they're already made CPU's rather then some profit over any core-disabled ones.

this also comes out against the I3 options with the dual HTT cores.
the I3 allows for cheaper platforms in general then the X58's calling out more buyers and allowing them with future upgrade (towards the I5's and the I7's).

this is possible but doesn't seem legitimate as Intel has decided to go out with two platforms to generally two market segments...
 

Dekasa

Senior member
Mar 25, 2010
226
0
0
I'm not sure about their current i3 and i5 lineups, but in the past a lot of their cutting down was just on cache. Late Core 2's had some pretty massive L2 cache, and the biggest difference between the E5/7/8000s was L2. Also, back then their quads were 2 dual-core dies, so they had no reason to make a tri. Now their quads are monolithic (all cores on 1 die), but all the quads they make are either on 1366, which is totally performance oriented and has no real place for tri-cores or are high-end i5/i7 parts. A 3 core/6 thread i7 would be very close in performance to the 4core/4thread i5 750, so there's no place for cut-down i7's, and the i5 750 with 3 cores would be so close to the other i5's it would be useless, too. No point in making all CPUs go through that extra level of testing + possible cutting for that low of volume.

More or less they just don't need them.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Intel does cut down on features, parts without VM support, parts without HT support, parts without turbo...we've kinda always assumed (the collective "we" as in common forum lore) that Intel created these cut-down parts artificially/intentionally so as to progenate their desired market stratification and segmentation...but what if they really are born from the desire to reduce/reuse/recycle as much of those chips as possible?
 

mutz

Senior member
Jun 5, 2009
343
0
0
we've kinda always assumed (the collective "we" as in common forum lore) that Intel created these cut-down parts artificially/intentionally so as to progenate their desired market stratification and segmentation...but what if they really are born from the desire to reduce/reuse/recycle as much of those chips as possible?
there are two viable scenarios here,
one is that a company manufacture all they're processors for the higher market region, then castrate them to create different market segmentation,

the other, is they make each market it's own SKU's independently (to a certain extent), which is too risky as not all processors comes out equal out of the process line.

seemingly what happens is the company manufacture a line of CPU's capable at the higher clock possible or doing it's best-effort process,
they have a certain amount of CPU's that can operate at 3.33, when the majority goes for 2.66 and others can scale up to 2.8 easily, 3.06, 3.4 etc.

they take the high clocked CPU's and deliver them as XE CPU's and the rest, they down clock to where the majority of chips can safely go,
so you can have chips going hardly with the specs and others going easier while listed under the same code name.
an easy going 2.75 under 130W TDP, would go to the 2.66 parts etc.,
a 2.9 would be saved for later delivery (such as the 930) to maintain the market momentum.

that's very clever strategy and Intel is doing it amazingly well,
it is very impressive seeing how they handle different portions of the market at different time of the year with new CPU's, lowering the elders prices and upping the new ones,
they seem to always want to keep the market alive and throw something new into it, even when they're newer architecture is at the edge of reveal.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Intel does cut down on features, parts without VM support, parts without HT support, parts without turbo...
Ok maybe you (or someone else) knows that: What are usually the highest failure parts of a chip? I assume it depends on the number of layers at the particular position and overall complexity of the part? Or is the largest factor still the size? In that case the cache would be the first candidate for down cutting I'd think.

In any case I don't think that anything but HT adds much complexity or needs a considerable amount of die space, so I think the "conventional wisdom" still holds.. market segmentation and not trying to weaken their brands.

@mutz: Not to forget that those masks are afaik extremely expensive, so that's probably also a factor there..
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
Voo, what you ask is of course a very difficult question to answer because what you are basically asking is "what are the weakest points in Intel's DFM (design for manufacturability) procedure?".

There are all kinds of design-rule limits and requirements for any given node aimed at avoiding the obvious yield-loss mechanisms...so things like metal density (CMP erosion/dishing issue) get deftly handled with a dummy-metal design rule, if they don't or aren't then this will become a pronounced area of weakness in a design when it goes to the fab for production.

So obviously I can't answer your question unless I start making all kinds of assumptions regarding Intel's design rules and DFM. And TBH even within Intel I'd be surprised if more than 300 people know enough about the actual specifics of the interaction between process technology and circuit layout to succinctly answer your question.

This just isn't one of those topics where tens of thousands of people have the opportunity to gain enough experience in such that they can answer with authority on the subject. (which is kinda sad, as an engineer, because the work is just so much fun...but I think Nasa engineers feel the same way when they look at their jobs too, its super geeky but so few actually get to do it in real life)
 

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
IDC, couldn't we assume a few things such as built in redundancy? It's my understanding that because cache is so dense, CPUs are actually designed with extra, and then unneeded / defective cache is fused off at test time.

So, if we can define "the highest failure parts of a chip" as a failure that makes the chip unsellable, then it may be safe to assume that a physical defect in ALU logic area could be a chip killer.

I have no idea what logic on a CPU would have redundancy. Probably a very closely guarded secret.

This doesn't address parametric yield though. Do you think Intel's process / manufacturing ability has progressed to the point where they no longer get bin failures? Is that even possible?
 

BD231

Lifer
Feb 26, 2001
10,568
138
106
Why don't CPU makers have 'factory outlets', irregulars, etc.

Why would Intel or AMD sell messed up parts when they could sell perfectly working parts at full market value to the same person by not presenting them with such an option in the first place??

You see AMD sell failed x4's as x3's, so there ARE parts out there that get sold as " irregulars ".
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
In any case I don't think that anything but HT adds much complexity or needs a considerable amount of die space,

Even Hyperthreading should only take 1-2% of total die size. I'd think features like Hyperthreading and Turbo Boost is disabled/enabled because of a combination of market leverage and clocking(ie. how high it will clock).
 

mutz

Senior member
Jun 5, 2009
343
0
0
What are usually the highest failure parts of a chip?
it is probable that the companies attack spots that become constantly defective,
if it's possible to add more layers, change certain portions of the chip architecture,
those seems more like defects coming out of design flows which can be calculated and then addressed,
saying that an ALU might be the most defective part in a processor is quite large,
it's almost like saying that a vehicle engine is what causes most line removals,

these defects should be much more minor than a whole ALU or FPU,
it could be that many processors has some defects in them while added circuity for redundancy mediate the binning,
other than that, common problems should be addressed by the companies, either by better redundancy, newer materials, or going over and over design flows.
 

Voo

Golden Member
Feb 27, 2009
1,684
0
76
Even Hyperthreading should only take 1-2% of total die size. I'd think features like Hyperthreading and Turbo Boost is disabled/enabled because of a combination of market leverage and clocking(ie. how high it will clock).
Yep, since even the module approach of AMD adds only 5% to the overall die size, I'd say that <2% is probably a good guess - but the other mentioned stuff (turbo, VM support) should add almost nothing to the die size.

@IDC: Thanks for your answer, thought that maybe one could make reasonable assumptions, too bad that doesn't work.. would be interesting.
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
IDC, couldn't we assume a few things such as built in redundancy?

That's my point, redundancy is but one of the protocols for DFM (design for manufacturability). The problem for you in me is that we don't know how extensive, or not, any specific DFM tool - such as redundancy - has been embodied into a given Intel IC.

We can make all manner of assumptions about how and where DFM is used or not used, and that would make for an enjoyable and lively discussion. My point there was just to say such discussion really wouldn't generate any more answers, probably just more questions, not a bad thing in its own right but I felt it was worth mentioning in advance.

There is another issue that effects the amount "harvesting" that can go on that is reliability. Depending on the nature of the origin of the defect in the defective circuit, the defect itself may "grow in time" to incapacitate neighboring circuits and eventually the entire chip.

If the "growth time" is comparable to the warranty period then harvesting and selling the chip will just come to represent a in-the-field liability, no one wants that.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
That's my point, redundancy is but one of the protocols for DFM (design for manufacturability). The problem for you in me is that we don't know how extensive, or not, any specific DFM tool - such as redundancy - has been embodied into a given Intel IC.

We can make all manner of assumptions about how and where DFM is used or not used, and that would make for an enjoyable and lively discussion. My point there was just to say such discussion really wouldn't generate any more answers, probably just more questions, not a bad thing in its own right but I felt it was worth mentioning in advance.

There is another issue that effects the amount "harvesting" that can go on that is reliability. Depending on the nature of the origin of the defect in the defective circuit, the defect itself may "grow in time" to incapacitate neighboring circuits and eventually the entire chip.

If the "growth time" is comparable to the warranty period then harvesting and selling the chip will just come to represent a in-the-field liability, no one wants that.

You can assume defects are pretty uniformly spread based on area. When they're not, a lot of engineerig effort goes into fixing the more-likely-to-be-defective parts.

As far as I know, redundancy for manufacturability happens in 3 places:
1) Vias. These are the connections between metal layers, and they don't print well. To compensate, wherever possible, multiple vias are inserted. A while back, there were articles claiming that Nvidia's 40nm Fermi yields were low because they didn't use enough redundant vias. The "overhead" for adding redundant vias ranges from zero to small, depending on how congested your routing (wiring) is. The benefit is enormous.

2) SRAM (memory, e.g. caches and register files) rows / columns. If you're building an SRAM with 128 entries and 128 bits per entry, you can expand it to 129*129 (or 128*129 or 129*128 depending on what sort of repairability you want). You then pay an overhead of ~1-2 gates timing penalty and a small area penalty (my math says very roughly 2% in this example), and can tolerate a defect anywhere in the whole SRAM. Since SRAMs tend to be very large, you get a huge bang for the buck here. A 64KB cache made from 1024 entries of 512 bits could be built as 1025 * 513, and there's a good chance that a random defect will fall onto something as big as the 64KB cache (check out some K7/K8/etc die photos to see just how big the instruction and data caches are). Note that you wouldn't actually build 1024*512...you'd break it into smaller chunks and add redundancy to each chunk, so the overhead is actually more than the 0.2% in this example.

One neat trick Intel apparently used with Atom is to incorporate ECC into your cache. If you have a defective cell*, ECC will correct the error for you, and your ECC protection effectively falls back to parity on that one cache line. For funky statistical math reasons (which would probably lead to an "0.999...=1"-style flame war if we tried to discuss them), you can still claim that you have the reliability of ECC even though you're taking away some of its protection.
*defective could mean functional-but-slow, or functional-but-not-at-very-low-voltage too.

3) Core-level or cache-level. L2 and L3 caches that are designed for a target size usually support disabling portions of the cache, so you can sell the chip even if there is an unrepairable defect in part of the cache (for example, a P2 or P3 with defective cache could be sold as a Celeron). In many designs, you can disable cores, so you can sell a quad-core with 1 defective core as a tri-core or dual-core. The overhead for parts that aren't defective is zero.

Psuedo-4) logic cells come in different "sizes" for driving larger or smaller loads / sending signals longer/shorter distances. The larger cells are effectively a bunch of smaller cells wired up in parallel. Certain failures in certain parts of larger cells can make them effectively weaker without breaking functionality if the circuit still operates at the required speed. I don't think anybody really uses this as a yield-improvement technique though, since upsizing gates unnecessarily would cost a lot of power.

You generally won't find redundancy in random logic such as ALUs. Structures like integer ALUs are actually pretty small, so the chance of a defect in them is low, and making an ALU "repairable" by adding a second ALU is a lot of overhead (every non-defective part now carries a useless "spare" and most defective parts didn't have the defect in the ALU). It's also very difficult to functionally incorporate the redundant ALU. For example, if you wanted to add 1 spare ALU to a 3-ALU design, you need to be able to get the inputs of all 3 original ALUs to the spare, and then get the result back to the right place. Shuffling data around ("bypassing") is expensive enough without redundant components.

It boils down to this: you can only add redundancy when the overhead doesn't exceed the incremental cost, and that tends to happen in places where you don't have to duplicate structures entirely.

As for selling a factory-second like a processor with one of its integer pipelines disabled, that gets complicated too. The biggest technical problem is probably that you have to verify that the processor still works properly. Processor verification is incredibly complicated, and you'll often find that one unit will make assumptions about the behavior of another unit. If the relative throughput of two units changes, the design may no longer work properly - potentially in very complicated, hard-to-find cases (so you can't just see if Windows boots / SuperPI runs and say "good enough"). The other big problem is that major customers tend to prefer consistency - customers don't want to have to qualify their software on a thousand different possible configurations (if cutting the reorder buffer in half costs 1% performance on average, the customer still has to make sure that none of his applications will suffer a 10% degradation and e.g. start dropping frames during video playback).
 

mutz

Senior member
Jun 5, 2009
343
0
0
some info regarding chip regulation:
Each chip is tested throughout the entire process&#8212;both while part of a wafer and after separation. During a
procedure known as "wafer sort," an electrical test is conducted to eliminate defective chips. Needle-like probes
conduct over 10,000 checks per second on the wafer. A chip that fails a test for any reason in this automated
process is marked with a dot of ink that indicates it will not be mounted in packaging.

http://download.intel.com/education/common/en/resources/TJI/TJI_TechSociety_handout4.pdf
 

bamacre

Lifer
Jul 1, 2004
21,029
2
61
(Can you imagine a 1366 single-core or dual-core CPU?)

I used to get and sell some Xeon W3500 series dual core 1366 cpu's. People were using them in Core i7 boards and getting some good OC's. I was selling them for ~110.