It completely depends on the program, nfs4.
Basically, here is a rough sketch of how the simd units work inside the athlon.
You have 8 general FP stack registers.
You have 8 MMX registers that are mapped onto (the same as) the FP registers. So you can't run MMX and FP code at the same time. To switch between them you need to issue the EMMS instruction. Actually this was one of the big deals in the athlon, the emms instruction was reduced from very expensive to basically free, making it more profitable to use 3dnow more freely.
You have 8 3dNow! registers mapped onto the FP registers as well. You can't execute 3dNow and FP code at the same time, (it's been a while since I wrote 3dnow assembly, but I believe you can do 3dnow and mmx together).
You have 8 SSE registers that are independant of the MMX registers. So you could theoretically interleave SSE and MMX/3dnow/FP code together. Why you would want to is another matter...
The big point is, the Athlon does NOT have seperate execution units for each of these. SSE/3dnow/FP/MMX opcodes share floating point execution units, just as they share load/store units, etc. So although you could write your program to use SSE & FP code at the same time, for example, it wouldn't be any faster -- because you're basically round-robining on the FP execution resources.
From the Athlon design guide (written before the XP),
"The Athlon has 3 floating-point logic pipelines.
The first is the adder pipe, which performs 3dNow! Add, MMX ALU/Shifter, and FP add.
The 2nd is the mul pipe, which performs 3dNow/MMX mul&reciprocal, and FP mul/div/sqrt.
The 3rd is the load/store pipe, which loads and stores data for all SIMD/FP types"
So... which one will be used in any program that supports all 3? Depends. SSE/3dNow do NOT replace MMX, they perform different things and are useful for different purposes. But between SSE and 3dnow, it basically comes down to how the programmers have written their detection routines.. which extension they look for first.