If caches are so simple, why doesn't chipmakers pioneer their technology with standalone cache?

MadRat · Nov 25, 2003

Intel just announced a working 4MB cache of SRAM on their newest 65nm process. Apparently its normal for manufacturers to prove their new processes on SRAM because its a feat of less technical challenge. Why not try to profit off the new process prior to manufacturing CPU's on it? I mean, right now they have 130nm Penitum 4EE's at Intel that cram 2MB of L3 cache on the already crowded core. If they used interconnects from the core to a separate L3 cache on the 90nm process then the core of the CPU could have been the same as its consumer-level P4 version for both P4EE and Xeon. Instead of a single 2MB L3 cache they probably could have offered 4MB at the 90nm range. If the cache failed testing then there isn't probably as much to lose. Packaging for L3 caches could probably include pinouts for both the current process and next process with a fuse controlling which set of interconnects are live.

Pudgygiant · Nov 25, 2003

Didn't they already have those and did away with them because they were too slow? I'm talking 486 era.

LurchFrinky · Nov 25, 2003

Originally posted by: Pudgygiant
Didn't they already have those and did away with them because they were too slow? I'm talking 486 era.

I'm pretty sure you're right, but I don't think it was that long ago. Wasn't this the exact reason they shifted to the PII slot design and then back again?

AndyHui · Nov 25, 2003

1. Going off-die to cache is slow.

2. They tried that with the Pentium Pro in a dual-cavity package design. You couldn't test the processor until after both the processor die and the cache were mounted. If either failed, then you lost both the cache and the CPU die. This is a lot more wasteful than simply blowing what cannot be used on the single die.

rjain · Nov 25, 2003

It also complicates the packaging procedure, no? Instead of a single die, they have to put two dice and connect them. There will probably also be routing complications because the cache doesn't need to be connected to any pins (does it?) so all the wires from the chip to the pins on the side where the cache is put need to be squeezed around the cache. My impression is that that's easy when you're building the chip (it's just part of the mask), not so easy when you're connecting the pads (you have to physically run the wires around there and make sure they don't come too close).

Peter · Nov 25, 2003

One word: Frequency.

Try to handle a 3 GHz backside cache bus.

This is the reason why EVERYONE went back to CPU internal caches, after the long episode of L2 cache on the front side bus (socket-7), and the brief appearances of L2 cache on a dedicated backside bus on a CPU cartridge (Slot-1, Slot-A).

Separate caches became increasingly useless as CPU frequencies went through the roof. With a 500 MHz CPU, you're not going to see much benefit from a 100 MHz L2 cache; on the other hand, in the slot approaches cache speed had to be kept at 1/2 or even 1/3 the core speed despite the physically close location and dedicated bus.

So as soon as this became feasible, everyone pulled the L2 cache onto the CPU die. AMD had a first and not-so-successful attempt (*) (production wise) with K6-III, followed up by K6-2+ and III+, before Intel went back from Slot 1 to socket 370 and AMD quickly followed from Slot A to Socket A.

Now with AMD64, we're getting the next move - pulling the entire main DRAM array onto a dedicated bus directly on the CPU.

(*) What, I'm not supposed to say "s t a b"? What gives?

MadRat · Nov 25, 2003

If the cache is off-die then why the need to strap it to a single memory bus clock? Cache speeds should be able to outstrip CPU core speeds if anything.

I understand that an external cache running at 1/2 or 1/3 of the core clockspeed was necessary because of trace length. But now we don't need wires we could use a dedicated low-power wireless signal between the core and the cache to eliminate trace lengths. At the distances we're talking it should be a cinch to ramp transfers bewteen chip and cache to 120-150Gbps and latency delays significantly lower than off-die 1/2 or 1/3 cache. With this short of transfer distance and with the inherent shielding of the socket, the low power of the transmission would keep it relative free from interaction with other nearby chips using the same wavelength, correct?

rjain · Nov 25, 2003

Wireless? Hah! Of COURSE air is a much better conductor than wires....

How on earth is the socket shielding anything? It doesn't even cover the whole package. Is it even grounded?

Where on earth do you get your 120-150Gbps numbers?

Latency is minimal with wires, too. It's the latency of the SRAM cells and addressing circuitry.

f95toli · Nov 25, 2003

Wireless at 150 Gbps is simply not possible, there is nothing that can modulate a wireless signal that quickly. You would need something like 250-500 GHz bandwith (Rule of thumb calc: 150*3 in order to get decent rise-time of the pulses).
Getting anything wireless of a chip faster than a few Gbps is still VERY difficult and is usually done using semiconductor lasers.

rjain · Nov 25, 2003

And wouldn't fiber optics be better than a semiconductor laser through air?

Pudgygiant · Nov 25, 2003

W00t I win!

I'd like to thank all the little people I had to step on to get here...

(The rules of course being the first correct post is the winnar.)

MadRat · Nov 25, 2003

*gives Pudgygiant a pat on the back*

Search

If caches are so simple, why doesn't chipmakers pioneer their technology with standalone cache?

MadRat

Lifer

Pudgygiant

Senior member

LurchFrinky

Senior member

AndyHui

Administrator Emeritus<br>Elite Member<br>AT FAQ M

rjain

Golden Member

Peter

Elite Member

MadRat

Lifer

rjain

Golden Member

f95toli

Golden Member

rjain

Golden Member

Pudgygiant

Senior member

MadRat

Lifer

TRENDING THREADS