<< From what I have read it's more about the complexity of integrating the DRAM with the chip. Apparently it complicates the chip so much that it becomes nearly impossible to work with. >>
Chips are absolutely unbelieveably complex already. Teams of 100-300 engineers work 2-5 years full time on designs involving 10's of millions of transistors. I can't imagine that the problem in complexity - at the minimum you could do what some L2 SDRAM cache designs do and completely partition the problem. The core is one design, the embedded DRAM block another and then just integrate them at a few select design points. So you separate the problem into two separate problems altogether.
But I don't think that this is necessary. DRAM is an array, you design one bank together and you don't really need to worry about the rest. It's like SDRAM caches. A way is a complete design - beyond that point, you just tile up ways to make a cache array. The pain of doing this is pushed out to the decoder and control logic, but still that's no more complex that a standard design at the bank level... and the more complex decoder logic can just be synthesized and autorouted.
<< I always thought EDRAM had to be implemented like pointers in C... but here you have to make the chip point to the correct memory addresses. >>
You should be able to do it so that it is completely invisible to the end-user. It should be seamlessly integratable. I can't imagine that anyone would attempt this if it required a recompile - in the x86 world, recompiles are generally frowned upon (to put it mildly). If I were designing such a beast, I would probably either attempt to put so much DRAM on the chip that the system doesn't need to be upgraded (ie. try for >=64MB) or, if this wasn't possible due to reticile (sp?) considerations, then I'd use the DRAM to implement a mammoth cache (like 16MB, for example).