Yeah pretty much agree, except maybe this part:
At a very basic level both techniques try to increase throughput by accessing independent memory units in parallel, but dual channel memory controllers are restricted to those two channels, while raid0 can scale upto available device channels in the used controller until the central bus is saturated.
There's nothing preventing you from interleaving 32 channels of memory if you want to(other than physical limits, the amount of traces would be insane with DDR SDRAM), pincounts, and such.
But in higher end systems, 4-way interleaving isn't too uncommon, end the EV7 Alphas use 8 RDRAM channels IIRC, though it needs mention that RDRAM at 16 bits width, which is what I believe the EV7 uses, has a far lower pincount than your 64bit DDR DIMM's, so it's not really comparable to 8 way interleaving of DDR SDRAM.
Anyway, that's nitpicking, and yeah, the problem is most certainly elsewhere.
When you say removing one module, do you mean you have two 512 MB modules(for example) and remove one, or do you mean you either use one 1GB module or two 512 MB modules?
Going from 2x512 to 1x512 could certainly make it far mory choppy, while going from 2x512 to 1x1024 MB would make a very very small difference.