Bus, in the K8: the question doesn't really make sense.
Where the caches show diminishing returns depends a lot on everything else. For the tasks most PC users need, high speed RAM can't replace cache. Even at a somewhat low bandwidth, cache is extremely low latency, being
right there.
The thing about a lot of physics and other such things is that you can have hundreds of items, that are already in a sequential order in memory, or at least a way to get them into some order like that. Then they need to have several operations done to them, again, across many many items. If the chip can be kept fed, these tasks should be able to just keep scaling on up.
Look at Celerons. A 3GHz Celeron is almost as fast at several encoding tasks as the real P4. However, try using it for simple things like office apps, and compare it against the P4.
As far as the server stuff, they really don't, right now. AT's reviews show the same kind of pattern found elsewhere:
Apache scales like nobody's business. It would probably do quite well with extra cores that have small caches. However, that performance starts to dwindle in real use, with PHP, and maybe a DB attached.
DB apps scale up for CPU speed well, but less so for extra cores. Since speed bumps are coming more slowly, extra cores are a must (and in AMD's case, do add more than two separate CPUs), but a faster single core would reap more performance. Also, with the overwhelming majority of stuff being in cache when needed, removing the cache would mean constantly going out to RAM. Even fast RAM, like on video cards, is nowhere close to as fast as cache. They might move 30GB/s, but the time taken to move a handful of bytes from point A to point B is ludicrous.
Applications can be ported to specifically use a certain design well (that's part of how otherwise average dual G5s manage to make a cheap top-10 supercomputer), but it takes time and effort that most people, and companies, can't deal with. A well-balanced solution is generally better than a highly tweaked one. The balanced solution will offer good, predictable, performance across a range of uses, even with stock parts and software. Current x86 stuff fits the bill, with cache taking up plenty of space. The Cell, with so little in the way of cache, is anything but balanced.
Even if this XDR (or DDR3) is extra-expensive, we're already paying for it in the form of high-end video cards, with the additional expense of system memory. If the whole allotment of RAM was high-speed, turbo-cache/hyper-memory on high-end GPUs could work well.
Additional expense? You can get 1GB of name brand DDR RAM for under $90. That's
CHEAP.
The other problem is that this RAM won't work that fast as system RAM in the same scenario. As a waveform passes through things, it gets bounced around, and has interference added in. There's much more room for that to occur on the motherboard's memory bus(es) than on a video card.
Video card: rarely more than four chips per RAM channel, and only a few inches of traces to go through.
System: eight or sixteen chips
per module, with one to four modules per channel, and each module is plugged into a shared bus.
How can you clean up the signal between chips, banks, and over the bus? Having a point-to-point design needs more pins and traces, or increases latency (FBDIMM). A method with a repeater would add to latency. RDRAM (again, dunno about XDR) had some help by having dummy modules to help keep the bus from being much of a problem.
The system RAM has to deal with a very different situation than video, and must be very tolerant--you don't know what it will work with! On a video card, it's all set down. The BIOS is made specifically for the RAM chips it uses, and the other stuff, like PCB and power regulation, still only have to work with a limited range of parts. If they work, you're done (OK, it's hyperbole, but the point remains). With a typical RAM stick, there will be dozens of, probably more, chips, chipsets, and hundreds of motherboards and BIOSes, that may need to use it. To top that off, they may then need to run it at a different speed than the max it was sold as.