This is from 9/02, but I dont recall seeing it discussed here. Has anyone else heard of this? Looks like one more reason to be sure to get a C1 stepping P4.
Geek.com
Geek.com
Many have noticed that the 2.8GHz P4 released recently had some fairly substantial performance increases. Well, it's not just because its faster in MHz alone. Intel has now fixed a performance decreasing bug that's plagued P4s and Xeons since the get-go. The bug was a slippery little devil which comes up only under certain circumstances but causes prefetch operations to fail. Hardware prefetch is one of the numerous speed-up mechanisms employed by modern super-scalar processors. It tries to look ahead to see what data might be needed next. When it finds some, it starts the load several steps in advance, so when it's actually needed later on it's already there. This removes some of the latency associated with data reads from main memory or the cache. Intel's IA-32 documentation refers to them as temporal loads, or loads that are altered slightly from the time they'll actually be needed in code. Prefetching works great when it works; but Intel's P4s and Xeons (prior to the C1 stepping) had a flaw that affected prefetch operations by corrupting the data under certain circumstances. The C1 stepping has now fixed the problem, and Intel's processors today can run with fully enabled prefetching. The errata is described by Intel on page 28, item "O37" of this PDF. It states:
Problem: The processor may use stale data from the cache while the Hardware Prefetcher is enabled.
Workaround: Disable the Hardware Prefetcher by setting bit 9 in register IA32_MISC_ENABLE - MSR Address 01A0h via the BIOS.
In summary, many of the performance increases we're seeing with Intel's latest P4s (and Xeons) come from the fact that Intel can now use their hardware prefetch mechanisms. Read more at The Inquirer.
Rob's Note: AnandTech saw a 2% speed-up in their tests of the new core vs. the old core at the same clockspeed. The letter to The Inquirer states the bug fixes "... may be the source of the amazing increase in performance." This refers to an earlier The Inquirer article entitled "P4 2.80GHz Bapcocked to 3.917GHz," where liquid nitrogen was used to overclock it to 3.917GHz. So, there's a slight performance increase at regular clock speeds, and possibly the bug fix accounts for the ability to clock the P4 much higher.