AMD delays Phenom 2.4 GHz due to TLB errata

brxndxn

Diamond Member
Apr 3, 2001
8,475
0
76
Hopefully this is the entire problem that caused AMD to be unable to reach higher speeds in numbers..

 

nyker96

Diamond Member
Apr 19, 2005
5,630
2
81
they got nothig but trouble lately, however, 3xxx g-card launch was pretty decent so far, many sellout pretty quickly.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
The article claims there's no microcode workaround and product has to be recalled, but also says that there is a BIOS update to work around the problem. That doesn't make much sense to me.
 

DrMrLordX

Lifer
Apr 27, 2000
22,915
12,988
136
That is odd. Particularly this bit:

"Some 9500/9600 parts may even be overclocked to 2.6, 2.8, 2.9, 3.0 GHz and they will have no problems whatsoever, while some will have this error."

That does not sound good at all. However the news that Phenom will go to speeds as high as 3 ghz stock once B3 hits the market is good news.
 

zach0624

Senior member
Jul 13, 2007
535
0
0
Originally posted by: Phynaz
Theo Valich is a clown.

Ignore him.

The Inq. hasn't been right on many things regarding amd and phenom so I wouldn't take this too seriously (remember that 3dmark record score?). also the 9500 and 9600 not having the problem and hitting 3ghz is a little fishy.
 

DrMrLordX

Lifer
Apr 27, 2000
22,915
12,988
136
After reading Anandtech's Phenom review it would seem that B2 chips may indeed be having problems. Theirs certainly wasn't stable at high speeds, and they did cite the TLB problem (though they did not make the connection between the two).
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: Phynaz
Theo Valich is a clown.

Ignore him.

This is scary...I agree completely. :Q

However, the data that he presented so incompetently and completely misunderstood is essentially correct (and you'll note that it's exactly what I've been saying for months now...).
 

Viditor

Diamond Member
Oct 25, 1999
3,290
0
0
Originally posted by: CTho9305
The article claims there's no microcode workaround and product has to be recalled, but also says that there is a BIOS update to work around the problem. That doesn't make much sense to me.

I'll quote from a more knowledgable source than myself...

"There is microcode for the L3 controller separate from the main controller loop that is not updatable"
"In K7, all the memory controller microcode is generated during bootup by the BIOS. In K8 the same thing happens for the main memory controller loop, but there are some microcode routines for controllers (such as the L3 controller) which are not part of the main loop. These cannot be updated without a mask revision. As I understand it, there will be a BIOS update which allows the main loop to recover from a glitch in synchronization. If the problem occurs rarely or at all--as is expected in 2.3 GHz and below CPUs--this is sufficient. But if it occurs constantly, you take a big performance hit. So rev B3 is needed to swat the problem at its source"
 

DrMrLordX

Lifer
Apr 27, 2000
22,915
12,988
136
So this is a problem with the L3 cache controller? Could this bug be side-stepped by disabling the L3 cache altogether?
 

harpoon84

Golden Member
Jul 16, 2006
1,084
0
0
Originally posted by: DrMrLordX
So this is a problem with the L3 cache controller? Could this bug be side-stepped by disabling the L3 cache altogether?

According to Fudzilla, yes you can, but you lose 10% performance, so it's hardly practical.
 

DrMrLordX

Lifer
Apr 27, 2000
22,915
12,988
136
Interesting. I wonder of B2 chips will OC better with their L3 cache disabled. Hmm.
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: harpoon84
Originally posted by: DrMrLordX
So this is a problem with the L3 cache controller? Could this bug be side-stepped by disabling the L3 cache altogether?

According to Fudzilla, yes you can, but you lose 10% performance, so it's hardly practical.

The L3 is worth 10% performance? Wow, that's pretty amazing. Got links to benchmarks that compare L3 enabled/disabled?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
64
91
Originally posted by: CTho9305
Originally posted by: harpoon84
Originally posted by: DrMrLordX
So this is a problem with the L3 cache controller? Could this bug be side-stepped by disabling the L3 cache altogether?

According to Fudzilla, yes you can, but you lose 10% performance, so it's hardly practical.

The L3 is worth 10% performance? Wow, that's pretty amazing. Got links to benchmarks that compare L3 enabled/disabled?

I am surprised as well, as this would imply the bulk of the K10 IPC improvements relative to K8 come from cache hierarchy and not micro-architecture improvements.

I assumed L3$ would improve performance scaling as number of threads crossed the 2->3 boundary.

I.e. a dual-core K10 should perform at least as good as dual-core K8 even with L3$ disabled or removed entirely. Shouldn't it?
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: Idontcare
Originally posted by: CTho9305
Originally posted by: harpoon84
Originally posted by: DrMrLordX
So this is a problem with the L3 cache controller? Could this bug be side-stepped by disabling the L3 cache altogether?

According to Fudzilla, yes you can, but you lose 10% performance, so it's hardly practical.

The L3 is worth 10% performance? Wow, that's pretty amazing. Got links to benchmarks that compare L3 enabled/disabled?

I am surprised as well, as this would imply the bulk of the K10 IPC improvements relative to K8 come from cache hierarchy and not micro-architecture improvements.

I assumed L3$ would improve performance scaling as number of threads crossed the 2->3 boundary.

I.e. a dual-core K10 should perform at least as good as dual-core K8 even with L3$ disabled or removed entirely. Shouldn't it?

I would expect a dual-core Barcelona-based chip to match or kick the crap out of a K8 depending on the application. In particular, significant improvements are SSE128 (doubled SSE performance) and the doubled L1 bandwidth. One possibility is that some of the benchmarks are using codepaths that are highly-optimized for K8, and may not schedule operations efficiently to take advantage of Barcelona's enhancements. I don't really see how some of the synthetic benchmarks could reliably measure, say, floating point performance without using code tuned to each microarchitecture.

edit: To elaborate, obtaining decent performance from modern processors is easy. They'll do a certain amount of rescheduling of instructions if you don't do an optimal job, and they're relatively forgiving. Obtaining peak performance, however, is much more difficult. You have to track a lot of things - making sure operands are ready at the right time, keeping in mind how much work you have to make available to cover the latency of a cache access, keeping track of decode slot limitations (particularly on the Intel chips), etc.

Note: this next paragraph is based on my current understanding of the architectures, but I'm not sure about any numbers here and don't really know how to take advantage of SSE. Highly optimized code for K8 might keep execution units busy during a load operation by performing 2 128-bit SSE additions, which is 4 cycles of work - if the next instruction depends on the load, the code keeps the execution units busy 100% of the time. The same code sequence is sub-optimal on Barcelona/Phenom: the 2 additions would take only 2 cycles total, leaving the execution units idle for a cycle (loads take 3 cycles).

Psuedocode for the case I'm thinking of:
r1 = mem[1234] <- cache access, data won't come back for 3 cycles
r2=r2+r4 <- 128 bit packed add, 2 cycles on K8, 1 on Barcelona/Phenom
r3=r3+r5 <- 128 bit packed add, 2 cycles on K8, 1 on Barcelona/Phenom
r4=r1+r6 <- depends on the first instruction; r1 won't be ready yet on Barcelona/Phenom
xyz=abc <- some other instruction; unless this instruction doesn't depend on r1 / r4, Barcelona and Phenom will have to spend a cycle doing nothing.

edit2: One thing that disappointed me is that reviewers didn't do much analysis of per-thread performance relative to K8. On linux, I'd think it would be easy enough to keep the scheduler from using the 3rd and 4th core; there might be ways to do it on Windows too (worst case, 2 instances of while(1); run at high priority with affinity set to particular cores?).
 

DrMrLordX

Lifer
Apr 27, 2000
22,915
12,988
136
Originally posted by: CTho9305

The L3 is worth 10% performance? Wow, that's pretty amazing. Got links to benchmarks that compare L3 enabled/disabled?

I was thinking the same thing and would like to see the same numbers. With the L3 cache supposedly locked at 2 ghz in B2 chips, and with the way system memory performance scales so well at higher clock speeds in K8 chips (something that will hopefully be true in K10 chips as well), I would think that a heavily-overclocked B2 stepping Phenom would gain very little from its L3 cache.