More info regarding Phenom TLB issues

harpoon84

Golden Member
Jul 16, 2006
1,084
0
0
http://www.techreport.com/discussions.x/13724

Key points

*The TLB bug affects all current 9500/9600 Phenoms, not just the pre emptively recalled 9700 model as first reported. A BIOS fix is in the works that addresses the stability issues, but at the expense of 10% (or more) overall performance.

*Newer models featuring a hard fix for the TLB issue (not a BIOS/microcode update) will be designated Phenom 9550 and 9650.

*Retail versions of Phenom are slightly slower than the reviewed samples due to a slower NB speed, 1.8GHz instead of 2GHz. This also reduces the L3 cache frequency down to 1.8GHz since it is based off the NB speed.

This is hardly the news AMD needed. They are lurching from one disaster to another, this is one of the most botched launches I can recall, even the P3 1.13GHz fiasco doesn't compare to this.

In hindsight, Phenom (and arguably Barcelona) should never have been released in this state. B2 is clearly buggy and not ready for retail circulation, they should have waited until the B3 stepping, or at the very least until the TLB bug was fixed.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
yea but if they kept on waiting they would have lost MORE money...

Sell it now, finish it later.
 

j0j081

Banned
Aug 26, 2007
1,090
0
0
well sucks for now yeah but I still think Phenom shows potential for down the road.
 

DrMrLordX

Lifer
Apr 27, 2000
21,992
11,547
136
What a mess. I'm sort of glad my motherboard does not yet have proper support for Phenoms . . . this protected me from buying a buggy B2 chip. Blah!
 

bradley

Diamond Member
Jan 9, 2000
3,671
2
81
The bug only exhibits itself if you overclock, even then it's fairly rare.
 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
Originally posted by: bradley
The bug only exhibits itself if you overclock, even then it's fairly rare.

Okay.. as long as you apply the same standard to Intel.

 

bradley

Diamond Member
Jan 9, 2000
3,671
2
81
Originally posted by: zsdersw
Originally posted by: bradley
The bug only exhibits itself if you overclock, even then it's fairly rare.

Okay.. as long as you apply the same standard to Intel.

I'm just talking about real-world forum anecdotes. Otherwise you have to depend on the people making these reports, and there appears to be a deep-rooted bias lately. I keep waiting for the other side to be represented. And we know that Intel has really deep pockets.
 

zsdersw

Lifer
Oct 29, 2003
10,505
2
0
and there appears to be a deep-rooted bias lately. I keep waiting for the other side to be represented. And we know that Intel has really deep pockets.

Irrelevant. If you don't apply the same standard for errata and bugs to Intel as you do to AMD, you're just being another fanboy.
 

bradley

Diamond Member
Jan 9, 2000
3,671
2
81
No offense, but I've always found the term fanboy a bit silly, perhaps even juvenile. Although, I've struggled with AMD and Intel systems alike, applied microcode and BIOS upgrades to stamp out major bugs. In that respect I'd imagine no company in such a fast-paced field is immune.
 

harpoon84

Golden Member
Jul 16, 2006
1,084
0
0
Originally posted by: bradley
The bug only exhibits itself if you overclock, even then it's fairly rare.

I wouldn't be so sure.

Extremetech couldn't finish testing After Effects CS3 due to instability.
http://www.extremetech.com/art.../0,1697,2226946,00.asp

There was another site (I can't remember which, I read like 10 reviews on launch day) which couldn't complete 3DMark06 at stock speeds either.

If the problem only happens when you overclock, AMD wouldn't be releasing a 'stability fix' for it, would they? Clearly, there are certain situations where even under stock speeds there are issues.
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Originally posted by: harpoon84
Originally posted by: bradley
The bug only exhibits itself if you overclock, even then it's fairly rare.

I wouldn't be so sure.

Extremetech couldn't finish testing After Effects CS3 due to instability.
http://www.extremetech.com/art.../0,1697,2226946,00.asp

There was another site (I can't remember which, I read like 10 reviews on launch day) which couldn't complete 3DMark06 at stock speeds either.

If the problem only happens when you overclock, AMD wouldn't be releasing a 'stability fix' for it, would they? Clearly, there are certain situations where even under stock speeds there are issues.

That's what I never understood when everyone was saying it only happens at 2.4ghz or more. That made absolutely no sense to me. So 2399mhz was fine but going above 2400mhz causes the problem? Plus wasn't this problem occuring when the CPU would be under 100% load? Wouldn't it be more likely to put a 2.2 or 2.3ghz cpu under 100% load quicker than a 2.4ghz cpu? AMD shipped a faulty product and instead of consumers and OEMs accepting just a "stability fix" that will give them a reduction of performance by 10% they should just demand a replacement with out the TLB Errata problem, if AMD refuses then class action law suit. Sounds cruel but they shouldn't of released a faulty product that can't live up to the hype that they built around it.
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
You may have something there. The carefully crafted preview a week or two from release (to build expectations / press buzz) knowing full well the product will behave differently in the wild can certainly be spun as intentional fraud. The key there being: did AMD know the performance-torpedoing errata would be necessary before they made the decision to ship?

I'm sure there's more than one lawyer taking a breather from chasing ambulances who's thinking this exact same thing right now.

Oh, and if you bought a Phenom -- I think you're still within your rights to return the platform as defective. Unfortunately this will also hurt motherboard manufacturers. Now they probably will have a case vs. AMD, but being partners and all I'm sure that sort of thing will be handled outside the courtroom.
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Originally posted by: v8envy
You may have something there. The carefully crafted preview a week or two from release (to build expectations / press buzz) knowing full well the product will behave differently in the wild can certainly be spun as intentional fraud. The key there being: did AMD know the performance-torpedoing errata would be necessary before they made the decision to ship?

I'm sure there's more than one lawyer taking a breather from chasing ambulances who's thinking this exact same thing right now.

Oh, and if you bought a Phenom -- I think you're still within your rights to return the platform as defective. Unfortunately this will also hurt motherboard manufacturers. Now they probably will have a case vs. AMD, but being partners and all I'm sure that sort of thing will be handled outside the courtroom.

Well I'm going to say of course AMD knew they had a problem, they didn't just make a chip and not test every aspect of it berfore shipment. They had probably been working on this fix before the release too. I don't want AMD to die just like any sane enthusiast but they should be held accountable for this.
 

taltamir

Lifer
Mar 21, 2004
13,576
6
76
there is no magical 2400mhz barrier at which point the problem starts happening... it happens at any speed, it is just Extremely like to happen at higher speeds and kinda rare at lower speeds. But kinda rare crashing is not acceptable for a NON overclocked chip.
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
Originally posted by: bfdd

Well I'm going to say of course AMD knew they had a problem, they didn't just make a chip and not test every aspect of it berfore shipment. They had probably been working on this fix before the release too.

Finding and debugging race conditions in concurrent resource access is not a trivial thing to do. It's not like you can create a test case, put it into your regression test framework and expect it to get hit. If the problem happens 1 time in say 10^20 operations you could test the living crap out of your product and never see the problem. And then ship and have half your customers encounter the problem immediately because they're using different data.

That said, having so many review sites encounter problems running benchmarks and test suites combined with a fix already available does not smell right. Since this is such a hard problem to diagnose I'd have a hard time believing engineers knew nothing about it yet managed to come up with a fix in a matter of days once it became known. Time to trivial change in most organizations is far longer than that, and this is not a trivial change. I'd bet money the fix was in QA when the preview event was happening.

 

Zstream

Diamond Member
Oct 24, 2005
3,395
277
136
The bios update removes the potential of hitting this area in the CPU which causes the bug. So, with the bios update it is 10% slower then normal. So learn your tech and come back to post.
 

SexyK

Golden Member
Jul 30, 2001
1,343
4
76
Originally posted by: v8envy
Originally posted by: bfdd

Well I'm going to say of course AMD knew they had a problem, they didn't just make a chip and not test every aspect of it berfore shipment. They had probably been working on this fix before the release too.

Finding and debugging race conditions in concurrent resource access is not a trivial thing to do. It's not like you can create a test case, put it into your regression test framework and expect it to get hit. If the problem happens 1 time in say 10^20 operations you could test the living crap out of your product and never see the problem. And then ship and have half your customers encounter the problem immediately because they're using different data.

That said, having so many review sites encounter problems running benchmarks and test suites combined with a fix already available does not smell right. Since this is such a hard problem to diagnose I'd have a hard time believing engineers knew nothing about it yet managed to come up with a fix in a matter of days once it became known. Time to trivial change in most organizations is far longer than that, and this is not a trivial change. I'd bet money the fix was in QA when the preview event was happening.

Your point about not being able to test for every single possible errata off the bat is all well and good, but you would think that running the chip through basic benchmarking suites and day-to-day usage scenarios would be a given. There is no way that a chip should be released if it cannot handle these basic scenarios.

There is no way AMD didn't know about this issue before releasing the chip. Maybe they thought they could fix it with a BIOS or microcode update and got burned. Either way, they knew.
 

bfdd

Lifer
Feb 3, 2007
13,312
1
0
Originally posted by: SexyK
Originally posted by: v8envy
Originally posted by: bfdd

Well I'm going to say of course AMD knew they had a problem, they didn't just make a chip and not test every aspect of it berfore shipment. They had probably been working on this fix before the release too.

Finding and debugging race conditions in concurrent resource access is not a trivial thing to do. It's not like you can create a test case, put it into your regression test framework and expect it to get hit. If the problem happens 1 time in say 10^20 operations you could test the living crap out of your product and never see the problem. And then ship and have half your customers encounter the problem immediately because they're using different data.

That said, having so many review sites encounter problems running benchmarks and test suites combined with a fix already available does not smell right. Since this is such a hard problem to diagnose I'd have a hard time believing engineers knew nothing about it yet managed to come up with a fix in a matter of days once it became known. Time to trivial change in most organizations is far longer than that, and this is not a trivial change. I'd bet money the fix was in QA when the preview event was happening.

Your point about not being able to test for every single possible errata off the bat is all well and good, but you would think that running the chip through basic benchmarking suites and day-to-day usage scenarios would be a given. There is no way that a chip should be released if it cannot handle these basic scenarios.

There is no way AMD didn't know about this issue before releasing the chip. Maybe they thought they could fix it with a BIOS or microcode update and got burned. Either way, they knew.

That's what I mean. They knew about this for sure. And to say they wouldn't test every aspect before release is even worse than saying they let this flaw out. They spent years R&D and that includes testing to make sure it's stable. AMD released the flaw fully knowing it would cause stability issues and they wouldn't be able to have a simple fix without hurting performance.
 

v8envy

Platinum Member
Sep 7, 2002
2,720
0
0
Originally posted by: bfdd

That's what I mean. They knew about this for sure.


It's not that clear. Originally this errata was blamed for not having 2.4 ghz or faster chips, then the story was it only happened at faster clock rates, and currently the story is the problem is unrelated to clock speed. (source: techreport interview with AMD pr guy.)

That certainly sounds to me like someone knew about a problem, but not the exact cause of the problem or the implications.

OTOH there was mention of the errata being available to board partners before launch. But since none of them have the fix out yet it can't have been that much before launch.

The 3xx0 video cards underperform NV's equivalents, yet still sell like hotcakes. A slower than Intel's equivalent quad core would likewise sell at some price. A crashy platform, not so much.


 

myocardia

Diamond Member
Jun 21, 2003
9,291
30
91
Originally posted by: v8envy
It's not that clear. Originally this errata was blamed for not having 2.4 ghz or faster chips, then the story was it only happened at faster clock rates, and currently the story is the problem is unrelated to clock speed. (source: techreport interview with AMD pr guy.)

I'm just wondering what the next story will be.:)

The 3xx0 video cards underperform NV's equivalents, yet still sell like hotcakes. A slower than Intel's equivalent quad core would likewise sell at some price. A crashy platform, not so much.

No doubt. They would definitely have sold quite a few, if only to some of the people who already owned AM2 motherboards. But, if I owned an AM2 motherboard, I wouldn't even be considering a Phenom.
 

AmberClad

Diamond Member
Jul 23, 2005
4,914
0
0
After reading the various reports about the TLB errata, I still haven't found a real clear answer to the question that's on my mind:

- Is this simply an problem with the logic of the TLB? (i.e. can be fixed 100% by a microcode update)
- Or is there a flaw in the architectural design of it? (a hardware issue)
 

ViRGE

Elite Member, Moderator Emeritus
Oct 9, 1999
31,516
167
106
Originally posted by: AmberClad
After reading the various reports about the TLB errata, I still haven't found a real clear answer to the question that's on my mind:

- Is this simply an problem with the logic of the TLB? (i.e. can be fixed 100% by a microcode update)
- Or is there a flaw in the architectural design of it? (a hardware issue)
It's a hardware flaw. They can work around it with microcode at a sizable performance penalty, but fixing it requires a new spin of the processor.
 

DrMrLordX

Lifer
Apr 27, 2000
21,992
11,547
136
Originally posted by: myocardia


No doubt. They would definitely have sold quite a few, if only to some of the people who already owned AM2 motherboards. But, if I owned an AM2 motherboard, I wouldn't even be considering a Phenom.

I would buy one, or at least consider it, if Abit would release a BIOS update for my board to support Phenom and if I could disable the L3 cache.

Sadly, I don't think any production AM2 or AM2+ board will allow you to disable the L3 cache in BIOS.