The platform makes a big difference. For example, a P4 is very much bandwidth starved. It's high clockspeeds require correspondingly high bandwidth. Since up until now, the P4's bus was able to scale faster than the memory it was using, dual channel was used. The gain from dual channel is pretty substantial, equating to somewhere in between 1 and 2 speed grades, i.e. 300MHz or so in performance. However, it is nowhere near as effective as actually doubling the memory speed. If the P4 could use single channel DDR2 800, it would be faster than dual channel 400.
Anyways, the PM runs at lower speeds with a smaller pipeline, so it is not as starved for bandwidth, especially with its large 2MB cache. As such, the design focuses less on being able to ramp up bus speeds. Since currently the PM can only go up to 533MHz or 400MHz, depending on the core, that bandwidth is much better used on single channel DDR2 533 or DDR400 rather than on dual DDR266 or 200. Of course, since there is no dual channel for the PM, this is a non-issue, but I thought maybe you would like to know why they did not offer it.
In general, if a platform has dual channel support, then running it in single channel is generally slower, although the amount depends on the platform. However, all single channel architectures are not necessarily worse than all dual channel ones - just ask a socket 754 owner. The great architecture of the A64 combined with the increased clockspeed over their 939 counterparts makes them just as fast.