That's my point, redundancy is but one of the protocols for DFM (design for manufacturability). The problem for you in me is that we don't know how extensive, or not, any specific DFM tool - such as redundancy - has been embodied into a given Intel IC.
We can make all manner of assumptions about how and where DFM is used or not used, and that would make for an enjoyable and lively discussion. My point there was just to say such discussion really wouldn't generate any more answers, probably just more questions, not a bad thing in its own right but I felt it was worth mentioning in advance.
There is another issue that effects the amount "harvesting" that can go on that is reliability. Depending on the nature of the origin of the defect in the defective circuit, the defect itself may "grow in time" to incapacitate neighboring circuits and eventually the entire chip.
If the "growth time" is comparable to the warranty period then harvesting and selling the chip will just come to represent a in-the-field liability, no one wants that.
You can assume defects are pretty uniformly spread based on area. When they're not, a lot of engineerig effort goes into fixing the more-likely-to-be-defective parts.
As far as I know, redundancy for manufacturability happens in 3 places:
1) Vias. These are the connections between metal layers, and they don't print well. To compensate, wherever possible, multiple vias are inserted. A while back, there were articles claiming that Nvidia's 40nm Fermi yields were low because they didn't use enough redundant vias. The "overhead" for adding redundant vias ranges from zero to small, depending on how congested your routing (wiring) is. The benefit is enormous.
2) SRAM (memory, e.g. caches and register files) rows / columns. If you're building an SRAM with 128 entries and 128 bits per entry, you can expand it to 129*129 (or 128*129 or 129*128 depending on what sort of repairability you want). You then pay an overhead of ~1-2 gates timing penalty and a small area penalty (my math says very roughly 2% in this example), and can tolerate a defect anywhere in the whole SRAM. Since SRAMs tend to be very large, you get a huge bang for the buck here. A 64KB cache made from 1024 entries of 512 bits could be built as 1025 * 513, and there's a good chance that a random defect will fall onto something as big as the 64KB cache (check out some K7/K8/etc die photos to see just how big the instruction and data caches are). Note that you wouldn't actually build 1024*512...you'd break it into smaller chunks and add redundancy to each chunk, so the overhead is actually more than the 0.2% in this example.
One neat trick Intel apparently used with Atom is to incorporate ECC into your cache. If you have a defective cell*, ECC will correct the error for you, and your ECC protection effectively falls back to parity on that one cache line. For funky statistical math reasons (which would probably lead to an "0.999...=1"-style flame war if we tried to discuss them), you can still claim that you have the reliability of ECC even though you're taking away some of its protection.
*defective could mean functional-but-slow, or functional-but-not-at-very-low-voltage too.
3) Core-level or cache-level. L2 and L3 caches that are designed for a target size usually support disabling portions of the cache, so you can sell the chip even if there is an unrepairable defect in part of the cache (for example, a P2 or P3 with defective cache could be sold as a Celeron). In many designs, you can disable cores, so you can sell a quad-core with 1 defective core as a tri-core or dual-core. The overhead for parts that aren't defective is zero.
Psuedo-4) logic cells come in different "sizes" for driving larger or smaller loads / sending signals longer/shorter distances. The larger cells are effectively a bunch of smaller cells wired up in parallel. Certain failures in certain parts of larger cells can make them effectively weaker without breaking functionality if the circuit still operates at the required speed. I don't think anybody really uses this as a yield-improvement technique though, since upsizing gates unnecessarily would cost a lot of power.
You generally won't find redundancy in random logic such as ALUs. Structures like integer ALUs are actually pretty small, so the chance of a defect in them is low, and making an ALU "repairable" by adding a second ALU is a lot of overhead (every non-defective part now carries a useless "spare" and most defective parts didn't have the defect in the ALU). It's also very difficult to functionally incorporate the redundant ALU. For example, if you wanted to add 1 spare ALU to a 3-ALU design, you need to be able to get the inputs of all 3 original ALUs to the spare, and then get the result back to the right place. Shuffling data around ("bypassing") is expensive enough without redundant components.
It boils down to this: you can only add redundancy when the overhead doesn't exceed the incremental cost, and that tends to happen in places where you don't have to duplicate structures entirely.
As for selling a factory-second like a processor with one of its integer pipelines disabled, that gets complicated too. The biggest technical problem is probably that you have to verify that the processor still works properly. Processor verification is incredibly complicated, and you'll often find that one unit will make assumptions about the behavior of another unit. If the relative throughput of two units changes, the design may no longer work properly - potentially in very complicated, hard-to-find cases (so you can't just see if Windows boots / SuperPI runs and say "good enough"). The other big problem is that major customers tend to prefer consistency - customers don't want to have to qualify their software on a thousand different possible configurations (if cutting the reorder buffer in half costs 1% performance on average, the customer still has to make sure that none of his applications will suffer a 10% degradation and e.g. start dropping frames during video playback).