<< 1. Can do 64bit arithmatic in one instuction rather having to do it as two 32bit parts, carrying the one, and stitching the results together. >>
But don't let this fool you into thinking that a 64-bit CPU will somehow be twice as fast as a 32-bit CPU. Most arithmetic is handled using SISD instructions (Single Instruction, Single Data), ie, 2 operands and one result. If you're doing 1 + 1, it doesn't matter in terms of performance if you express the operands as a 64-bit integer and do the arithmetic on a 64-bit CPU, or express them as a 32-bit integer and execute the arithmetic on a 32-bit CPU; both operations will be carried out at the same speed (actually, the 64-bit add may have a slightly higher latency, depending on the type of carry-lookahead adder used). A vast majority of the time, 32-bit code for a 32-bit CPU only needs 32-bit integers, so there is no performance advantage of the 64-bit CPU's ability to do 64-bit SISD instructions. Floating-point arithmetic does benefit from increased accuracy from increased bit-width, but x87 already does 80-bit internal precision for FP math.
SIMD instructions, on the other hand, do benefit from wider adders. For example, a 64-bit SIMD unit could do 2 32-bit adds at once, and a 128-bit SIMD unit could do 4 32-bit adds at once. But both the G4 and P4/Athlon already have 128-bit SIMD with Altivec and SSE.
In fact, all else being equal, a 64-bit CPU (especially one with fixed-length instructions like PowerPC) will be 5-10% slower than its 32-bit counterpart. Since the instructions and data are twice as large, the L1 and L2 caches will be able to store fewer instructions and data, resulting in lower hit rates. The speed disadvantage can be overcome by including larger caches.