Here is what i know about SM2.0 vs 3.0
Say you have a shader and its 500 instructions.
6800 on SM3.0 can do this in a single pass.
NV3X/R3XX/R420 (if it doesnt support SM3.0) on SM2.0 can do this in 6 passes in a best case scenario (if they can break it up to multiple 9X instruction count shaders). This will be an enormous performance hit.
I don't think that's how things work. As far as I know, on the R3xx it would take at least 250 cycles to execute a 500 instruction shader program regardless of wether it was PS2 or PS3. The NV3X would take at least 500 cycles. No idea regarding the R420. How fast a shader program executes is more dependant on hardware than on shader standards.
Heres in NVidia's own words are the differences between PS2 and PS3.
link
The only performance benefits listed are those gained by dynamic branching and "multiple render targets"
I imagine that the benefit gained from dynamic branching is illustrated by this crude example:
Without dynamic branching:
a = some calculations
b = some different calcuations
c = ((case == m)*0xff) & a + ((case == n)*0xff) & b
(forgive me if this is only understandable by programmers.)
Now with dynamic branching:
if (case == m)
c = some calculations
else if (case == n)
c = some different calcuations
See? You don't have to do redundant calculations or any of the stupid bit masking stuff when you have dynamic branching available. In this crude illustration, the use of dynamic branching doubles performance (assuming the hardware platform has no visible latencies [which isn't a realistic assumption]).
I think the biggest advantage of PS3 isn't speed but ease of use and versatility. Any programmers reading this no doubt take dynamic branching for granted. Programming without dynamic branching seriously sucks. Also, having dynamic branching allows for making shaders which were previously impractical due to performance hits caused by calculating too many different cases now practical.