Yeah better all or nothing. The SH4 is nothing compared to modern CPUs. There would be 1000x driver and API overhead trying to send some things to the GPU selectively; it's faster to just carry out the instruction with the CPU, esp with SSE, etc. having equivalent instructions.
Even if the host CPU had to emulate a complex SIMD instruction with several native instructions, it would still take just as many if not many more instructions to setup an API call to the GPU, repack the data, and much more wasted time to invoke a thread context switch, wait for the OS, driver, and GPU, etc. and the added overhead to the emulator core to stall and keep things in order. Highly inefficient.
Now a system like the N64 and PS2 with programmable GPU microcode and stand alone vector units, cross assembling the microcode and using the GPU shaders would be perfectly suited for the task. VU1 on the PS2 is very much like DX11 (can generate geometry, real time tessalation and subdivision , branching and looping, recursion, custom skinning/boning, VIF packing/unpacking of vector data, etc). These things weren't possible to GPU accelerate on the PC until DX11 due to API limitations alone.
As is, emulation of old consoles is so fast that the emulator has to be intentionally delayed to 30 fps anyway. There would be no benefit for higher performance unless you were trying to minimize requirements to target older or slower host platforms which won't support DX11 and GPGPU anyway.