Duallies definitely do better in any multitasking, SMP-capable operating system. There are two main reasons for this:
1) Context switches. In a modern (pre-emptive) operating system, when more than one process is assigned to run on the same CPU, the operating system's task scheduler has to hand out a timeslice to each application in turn every millisecond or so. When an process's timeslice expires, the task scheduler puts that process on hold and does a "context switch"--basically storing the entire CPU state left by that process, and switching to the CPU state saved for the next process destined to receive a timeslice. All the work involved in context switching incurs a pretty stiff CPU performance penalty that makes the system less responsive overall.
Having more than one CPU allows the task scheduler to keep multiple processes running simultaneously without having to perform so many context switches, simply because each processor can run a process in parallel (duh

). If you have two processes running that each completely saturate a CPU, it's actually theoretically possible for a pair of 400MHz CPUs to handle the two processes with better performance than a single 800MHz CPU, simply because of the dual-CPU system not having to pull off so many context switches. (This "theoretical" scenario doesn't occur often in the real world, simply because of issues of less-than-full CPU saturation, shared memory bandwidth, etc. etc.).
2) Non-preemptible kernel code. Many modern operating systems run much of their kernel-space code (IRQ handlers especially) in a non-preemptible state. This means that once this kernel-space code has possession of a timeslice (like when your NIC receives some data), the task scheduler can't force the kernel-space code to give up use of the CPU until the kernel-space work is done. Without multiple CPUs, this effectively means that every process is on hold until the kernel has done its thing. This can lead to things like your mouse skipping or your keyboard stopping while your analog modem dials a number or your printer spools a page. That sort of thing gets annoying if you're trying to keep up a steady stream of keyboard input.
In the x86 world, SMP scalability is not all too great; a dual P3 1GHz might have about the same average throughput as a single P3 1.4GHz, rather than a single P3 2GHz (nonexistent, purely theoretical CPU there). The actual responsiveness of the dual-CPU system is often much better, however, in terms of latency.