I'd say for the NV40:
32x4 (650million transistors producing 450Watts)
2.5 Ghz core clock (many wasted cycles)
1000mhz memory clock on a 1024bit bus (128GB/s)
R4XX
16x8 (425million transistors producing 190Watts)
700mhz core clock
4 256bit memory channels each feeding 1/4 of the GPU (1000mhz)