Pentium 4 is an architecture that is best at "instructions throughput". As long as the instruction stream is flowing in the processor, the P4 performs beautifully. However (and with the newer P4 architecture is more visible), the long pipeline makes it very costly to run code with lots of jumps.
The idea is one instruction enters the processor, walks over all the stages in the pipeline (well, a good part of them at least) and is retired. The microprocessor don't know before running what instruction to do next, so it assumes the instruction in the next memory location. When there is a jump, a P4 must discard all the instructions that were loaded and partially executed before the jump address is known. An Athlon64 must do just the same, BUT its pipeline is shorter, and the next instruction will be executed faster.
Pentium4 (the processor itself) is penalized in the worst case by a 25 something clocks period when a jump instruction "fools" it. The Athlon64 is penalized by a 10 something clocks period. So, taking into consideration a longer clock cycle for Athlon64, an Athlon can miss three jumps for every two a P4 misses. Quite a difference.
Also, Athlon has some advantages in the time it can bring data from main memory, and is well served by memory with rapid access. Pentium4 tries to load as much data as possible, as long as it might be needed. This require a big bandwidth - and while the difference in speed between Athlon 64 with single channel memory and dual channel memory is quite small (a 10% mostly from doubling bandwidth), and between AM2 and Athlon64 with dual channel DDR is not significant (again a doubling of bandwidth), the P4 of the 1600MHz variety was helped alot going from single channel SDR to dual channel SDR and later (2400+MHz) to dual channel DDR.
So, in the end, as long as your code is friendly to the P4, it runs better than an Athlon64 - and Photoshop plugins are very friendly to P4. Athlon64 runs much better with non-optimized code (non-optimized for P4 that is)