I was speaking for Intel... no, that's not true. I just roughly remembered what is sometimes seen in papers.
But based on your measurements, Intel's current cores with improved mem access handling over Thuban, etc., should land somewhere in the range I gave.
I'm not surprised about Prime95. <storymode>While discussing with George Woltman and others about K8 optimizations, I learned a lot of interesting things about the SSE2 (P4) optimizations. Prime95's split radix FFTs were 3 times faster than the next fastest lib (FFTW, IIRC). George did not only schedule SSE2 instructions to perfection (and FFT radixes of course), but also did nice tricks with TLBs, etc.