With the Athlon 64's on-die memory controller, your system RAM is kinda like L3 nowdays. I was talking to my boss about this exact thing the other day. Why not put about 128mb of super-fast GDDR3 or something on the board and use it as a buffer to RAM? I think to be honest, it's just not worth the added latency/cost to figure out how to use it. The CPU's memory controller would have to somehow figure it all out, and it's probably just too much work.
On the Intel side, P-4 EE's with 2mb (yes Computer MAn) L3 really weren't all that impressive in my opinion. The architecture should really benifit from having a bunch of cache, as it has a giant pipeline that's really hard to keep full as it is. But I think we'll all agree, a properly cooled 3.8 P-4 with no L3 is faster than a 3.4EE with 2mb. It's just time to shuck that broken design and move on to Dual-Core P-M's for the desktop.