P4 Data Prefetch bug prior to C1 stepping?

oldfart

Lifer
Dec 2, 1999
10,207
0
0
This is from 9/02, but I dont recall seeing it discussed here. Has anyone else heard of this? Looks like one more reason to be sure to get a C1 stepping P4.
Geek.com
Many have noticed that the 2.8GHz P4 released recently had some fairly substantial performance increases. Well, it's not just because its faster in MHz alone. Intel has now fixed a performance decreasing bug that's plagued P4s and Xeons since the get-go. The bug was a slippery little devil which comes up only under certain circumstances but causes prefetch operations to fail. Hardware prefetch is one of the numerous speed-up mechanisms employed by modern super-scalar processors. It tries to look ahead to see what data might be needed next. When it finds some, it starts the load several steps in advance, so when it's actually needed later on it's already there. This removes some of the latency associated with data reads from main memory or the cache. Intel's IA-32 documentation refers to them as temporal loads, or loads that are altered slightly from the time they'll actually be needed in code. Prefetching works great when it works; but Intel's P4s and Xeons (prior to the C1 stepping) had a flaw that affected prefetch operations by corrupting the data under certain circumstances. The C1 stepping has now fixed the problem, and Intel's processors today can run with fully enabled prefetching. The errata is described by Intel on page 28, item "O37" of this PDF. It states:

Problem: The processor may use stale data from the cache while the Hardware Prefetcher is enabled.
Workaround: Disable the Hardware Prefetcher by setting bit 9 in register IA32_MISC_ENABLE - MSR Address 01A0h via the BIOS.
In summary, many of the performance increases we're seeing with Intel's latest P4s (and Xeons) come from the fact that Intel can now use their hardware prefetch mechanisms. Read more at The Inquirer.
Rob's Note: AnandTech saw a 2% speed-up in their tests of the new core vs. the old core at the same clockspeed. The letter to The Inquirer states the bug fixes "... may be the source of the amazing increase in performance." This refers to an earlier The Inquirer article entitled "P4 2.80GHz Bapcocked to 3.917GHz," where liquid nitrogen was used to overclock it to 3.917GHz. So, there's a slight performance increase at regular clock speeds, and possibly the bug fix accounts for the ability to clock the P4 much higher.
 

motojeff

Member
Mar 21, 2002
115
0
76
Wouldn't you need a bios update to turn back on data
prefetching to see an improvement? Not sure how just
swapping the cpu's alone would show the speed up due
to this bug. Sounds like software needs to re-enable it.
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Originally posted by: motojeff
Wouldn't you need a bios update to turn back on data
prefetching to see an improvement? Not sure how just
swapping the cpu's alone would show the speed up due
to this bug. Sounds like software needs to re-enable it.

I see no evidence saying Data prefetching will not be enabled in all BIOSes to come from the manufactuerer. The workaround to disable prefetching if you have an older than C1 chip, and also only applies to the select few people out there who know how to write and program BIOS code.
 

ChampionAtTufshop

Platinum Member
Nov 15, 2002
2,667
0
0
Originally posted by: ketchup79
Originally posted by: motojeff
Wouldn't you need a bios update to turn back on data
prefetching to see an improvement? Not sure how just
swapping the cpu's alone would show the speed up due
to this bug. Sounds like software needs to re-enable it.

I see no evidence saying Data prefetching will not be enabled in all BIOSes to come from the manufactuerer. The workaround to disable prefetching if you have an older than C1 chip, and also only applies to the select few people out there who know how to write and program BIOS code.

or someone programs a bios, puts it on web, others d/l and flash heh
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
I wonder if my data in my c1 review may be showing a bit of this....look at (B0)1.6a@2.7(169) vs (C1)2.4@2.7(150)....
 

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
31,620
31,387
146
Originally posted by: Duvie
I wonder if my data in my c1 review may be showing a bit of this....look at (B0)1.6a@2.7(169) vs (C1)2.4@2.7(150)....
That's why I bumped it ;)
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Originally posted by: Duvie
I wonder if my data in my c1 review may be showing a bit of this....look at (B0)1.6a@2.7(169) vs (C1)2.4@2.7(150)....

I think you're right Duvie. Thugsrook thought the voltage was having ill effects on your old chip, but I beg to differ. You chip is at a lower FSB and still outperforms your 1.6 at the same speed, so there is definitely something to be said for Prefetching, and maybe moreso with Intel's next CPU.
 

THUGSROOK

Elite Member
Feb 3, 2001
11,847
0
0
i think i may have proof positive for you on that b0 prefetch bug....

ive got 5 chips sitting here: 3DMark2001

2.4B b0 @ 2.4 = 11987
1.8A b0 @ 2.4 = 11994
2.4B c1 @ 2.4 = 12101
1.8A c1 @ 2.4 = 12115
2.4B c1 @ 2.4 = 12111

all settings/drivers/temps were identicle and tested only days apart.
 

Ketchup

Elite Member
Sep 1, 2002
14,559
248
106
Plus the 2.5 and 2.6. Actually, it seems that all Northwoods are going to be C1's before too long.
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
Thugs and I also may have seen this in our divx scores with sizeable increases that seem to go beyond increases in clock speed...
 

THUGSROOK

Elite Member
Feb 3, 2001
11,847
0
0
oh yes, very sizable in divx encoding.
so sizable in fact it was obvious.
we're talking a good -15mins for a full encode just from prefetch :Q
 

Duvie

Elite Member
Feb 5, 2001
16,215
0
71
Prefetch had a iszeable increase when using divx 5.02 pro and qp (quarter pixel) feature...onrmally this has increased times but with the prefetching procedure the time using qp was a minor increase...that is clearly where I have seen a 15+min increase in performance....The qp is a heavy cpu procedure and very intense feature that is great for stress testing a cpu!!!