Generally, the Pentium 4 EE performed almost identically to the Pentium 4. It was literally a Pentium 4 with 2 MB of L3 cache taped on.
Almost identically, except in practice, with background applications running, and the like. Fine, if you were one of those people that, "don't need antivirus," or, "don't multitask," I guess. It was also not literally a Pentium 4 with 2MB cache added, but was very much a Xeon in a desktop socket. The dies were going to be made for it either way.
As for multitasking, Pentium D was far superior to any Pentium 4. The implementation of HyperThreading (especially prior to Prescott) wasn't great. I distinctly recall the 3.4 EE generally being comparable to an Athlon 64 3200+, which isn't really a "great job" considering the price difference.
Considering the price difference, no. It was only a good option if you were a fanboy, or stuck with big OEMs. The A64 X2s were plain better, and forced Intel to use pricing and distribution tactics, until they could make the C2D (Yonah wasn't 64-bit, and never got fast enough for desktop or server dominance).
But seriously, go fire one up, along with a nearby dedicated A/C units

, and use it. The extra cache significantly improves performance, in cases like we normally use computers, with things like AV clients and browsers in the background. If you just go by common review scores of the time period, 845 chipsets with PC133 were <20% slower, too, no different than just getting a bit slower CPU (when, in reality, you could mistake a ~2.4B for a Celeron--I, and others, have even done just that, with PCs from that era). Reality didn't match it, except in very simplistic scenarios. I can't find them off-hand, but some forum members went about contriving cases that tried to measure that kind of real performance (including max wait time, which would be min FPS in games).
The problem with the Pentium D was that if you have a program that used shared data, bouncing between caches hurt performance, as did Windows' non-sticky scheduling, due to all accesses hitting the NB (CPU0 check->NB->CPU 1 evict->CPU 0 load). You wouldn't see that in most benchmarks, though, because it really is harder to measure, without doing a trace and replicating it, as identically as possible, across systems. With two separate threads, not sharing data structures, and with no other applications to make the caches cold in between slices, the Pentium D looked a lot better than in practice. Now, that said, it was a good value, if your electricity was cheap--they were priced very well for what it was, and they OCed nicely (the P4EE was priced based on Intel being able to get it, due to not being supply-limited, like AMD).
Such performance issues could readily be measured with server software (max request time is commonly a very important metric, and performance tracking is often built into applications, to be used as they are running in production), which, along with the FSB limitations, was among the reasons for the bigger caches on common Xeons (2 CPUs, or a multiprocessing task, could be enough to benefit, in an easily-benchmarkable manner).
Luckily, today, we have effective multithreaded and multitasking benchmarks, that do a good job of matching up with actual use, for the most part. Bulldozer would have looked a lot better with common benchmarks and benchmark suites from the mid 00s or earlier, too.