Originally posted by: lexxmac
Matthias, I'd like to ask you for some proof that a chip to chip connection cannot run at 1 GHz or higher.
It would be quite simple to implement a 1GHz chip interconnect with today's technology. However, the data path would be so narrow that it wouldn't be worth the cost of redesiging existing platforms to use it to interface with CPUs. Of course, pin-count is always a major cost consideration, which is why we have RAMBUS.
Yes external SRAM is relatively low performance compared to on-die cache, but it is as you said yourself, still 10 times faster than DRAM even if it is more expensive to manufacture. Using the excuse of high cost is well, wrong.
Oh, see this is where
you're wrong, lexx. Cost
is the reason why we don't see external SRAM modules anymore. In the old days, you could see tangible performance benefits because clock speeds varied little between various componenets. Popping in SRAM caches off chip decreased access latency from main memory, which reached something like 70ns during burst reads while today's memory does aroud 40 ns for a random read.
Today, for the performance level required to get any type of benefit out of caching with the old SRAM module mentality, we're talking extremely high cost. Even in systems which try to incorporate more caches, the performance benefit is often measured in single digit percentage points whereas the costs exceed four-five figures. You'd have to be seriously gung-ho about squeezing the last drop of performance to buy off-die cache modules. However, at that point, you're probably more likely to replace your main memory with SRAM.
Speaking of which, Cray machines traditionally used SRAM only designs for the very reason that performance, no matter how little, was the main goal and cost was not a consideration.
The more on die cache you have, the more die space you take up per core, driving up the production costs due to low yields. By cutting the amount of on die cache, you increase yields on the CPUs themselves. Then, split the external cache between several chips, say four with one, two, or even four megs each. Remeber, low yields mean high cost. The higher the die size the lower the yields. Cut it up and increase the amount to compensate for the minor speed loss of not having it all on the same peice of silicon, and then you still have a faster and cheaper product.
You want to move on-die cache onto seperate modules. You believe that this will drive down costs. Your argument about die yield is along the right track, but it ignores everything else. Yes, smaller dies increase yield. However, too small a die also increases cost. The main reason cache is on-die is speed. There really is no easier way to decrease access latency than having the cache physically close to the CPU. With the cache on-die, L2 accesses of less than 5 cycles are possible and that's somewhat conservative. If you move the cache onto a seperate chip, latency increases to 10's or even 100's of cycles long. Long latency absolutely kills CPU performance, although I believe it runs a lot cooler.
Along the same lines, cache on die means the cache itself runs at very high clock speeds.
There is a reason why Intel and AMD moved cache on-die
as soon as technologically and financially feasible. For each process technology, a certain number of defects tend to show up on the wafer. The average is fairly constant between improvements. That means that there exists an ideal die size where the yield just reaches its peak. After that, the dies get smaller but the yield is fairly constant because the defects still affect the same percentage of dies. What does this mean? It means that when your CPU logic takes up less space than the ideal die size, it would be better to fill the extra space with cache than to build another CPU design. Not only would you get vast performance benefits, but you'd also bypass the need to do Pentium Pro or cartridge type packaging which, btw, is seriously expensive.
If, on the other hand, we were to follow your philosophy, then one is quick to realize that individual CPUs would not be the right component to focus our efforts. Instead, we'd have massively parallel systems because lots of CPUs would"...compensate for the minor speed loss of not having it all on the same peice of silicon, and then you still have a faster and cheaper product." I think we all know that's not entirely true. Massively parallel works sometimes, but not always same with massive amounts of cache or small, fast cache.
Itaniums cost so much becasue the people who need them have more money than they can shake a stick at and Intel can get away with charging so much for them.
Xeons cost so much because the very same type of people don't want to scarp existing x86 systems and invest in IA-64 but still want performance. Notice Xeons have on-die L3 cache. If it were that much cheaper to use off-die L3, Intel would do it, and still charge the same amount as they do now just to reap higher margins. Cache is on-die for a reason, and it's not to screw over the customer.
Originally posted by: Pudgygiant
What would be the limitations
prohibiting dual-pumping HyperTransport or similar at 800mhz per pipe?
If I remember correctly, HyperTransport is already "double-pumped." It reads data at the rise and fall of each clock.