It will depend on what you're doing, and how the tasking can be acomplished, and how the application is written. Sometimes there's not much difference, somethimes there is.
Somewhere along the line, something in hardware or software has to decide what gets processed where, and that logic will slow down the processing while it's deciding. It's more latency than slower processing...but the result is that on a single processor, the task gets done sooner. In addition, some processing time is taken away from the primary task while "something" is handling the multiasking schedule. When you take cycles away from the task, it'll take longer to execute....try Seti (gui version) versus Seti (CLI version)...on a single processor, maintaining the GUI and doing the graphical output just about doubles the time.
Seti is not exactly a perfect example, because whether the results are on-screen or not, the calculations for the display are still being done. The Example was to illustrate the point of some cycles being taken away for other tasks, and the ultimate result is is the "main" process will take longer to finish.
You get just som many cycles, the more you spread 'em around, the less processing-per-task is gonna happen (plus the administrative time switching tasks).
FWIW
Scott