cnhoff - Memory performance is practically a non-issue with RC5. It fits inside 8Kb L1 cache's (well, 16Kb, at any rate, in a Harvard Architecture). S@H and RC5 are TOTALLY different beasts. RC5 makes exclusive use of the ALU's, and is heavily reliant upon the bit-wise rotate left instruction.
S@H benefits from a low latency memory hierarchy, and strong FPU performance. The Athlon4/MP has both improved as compared to the T-bird, due to hardware prefetch (lowers average memory latencies due to increasing the hit-rate of the on-die caches when seeking instructions), and, potentially, due to the inclusion of SSE (I don't know how much, if at all, the S@H client is optimized for SSE). The improved TLB's also help to reduce memory latency.
RC5 does not benefit from the improvements made in the Athlon4/MP.