Skylake AVX bug

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
See https://communities.intel.com/thread/96157?start=0&tstart=0

It appears when running Prime95 on specific FFT sizes, while it's forced to use AVX instead of AVX2/FMA3. As it's written in the thread above, FMA3 has to be disabled.

As there will be a BIOS fix, it could be power management related (voltage/frequency) as this might cause calculation errors. If this is the case, Skylake users in this forum might try to provoke such errors by applying slightly lower voltages or higher frequencies.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
2
26
This has been known and has existed since Haswell. It's a design flaw, not a bug.
 

zir_blazer

Golden Member
Jun 6, 2013
1,163
405
136
This has been known and has existed since Haswell. It's a design flaw, not a bug.
Check the comments. The guys that discovered it claims that it doesn't happen on previous platforms, its Skylake only. And Intel themselves says in the second page that they were able to reproduce the bug and are preparing a fix.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
BIOS(microcode) fixes should come this month. Perhaps MS will distribute it as well with Windows.
 

Brunnis

Senior member
Nov 15, 2004
506
71
91
I've been following this for the past week. As said, let's hope the microcode update doesn't affect performance in any significant way. Even if it does have an effect, I doubt it would be of any significance to "general usage".

This has been known and has existed since Haswell. It's a design flaw, not a bug.
As said, it's not present in Haswell (at least not to anyone's knowledge). Besides, in what case is it not a to be considered a bug when a certain instruction mix causes the CPU to halt execution?
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
From the thread it *sounds* like a circuit marginality issue: "under certain complex workload conditions"

The fix should not affect performance.
 
Mar 10, 2006
11,715
2,012
126
From the thread it *sounds* like a circuit marginality issue: "under certain complex workload conditions"

The fix should not affect performance.

Could you, or somebody else, explain in more detail what this means exactly? I would greatly appreciate it. Thanks!
 

dmens

Platinum Member
Mar 18, 2005
2,271
917
136
Could you, or somebody else, explain in more detail what this means exactly? I would greatly appreciate it. Thanks!

Sure. You can bucket CPU bugs into logical and circuit bugs.

Logical bugs are in the RTL itself, so no amount of frequency slowing or voltage increasing can fix it. For example, FDIV, and more recently TSX. Those bugs are fixed by a new stepping, or a workaround in the microcode, or worst case, completely disabling the affected feature.

Circuit bugs are when the circuit does not behave correctly as the manufacturer said it should within defined parameters (voltage, frequency, temperature, etc). Like some part is sold at 3ghz, but there is a workload that fails at 3ghz, but you downclock to 2.8ghz and it works fine. Lots of things can cause this (too many to talk about here).

AFAIK (but don't take my word for this, I don't write errata) the typical terminology for logic bugs is "under some corner case", whereas a circuit bug would be "under some workload".
 
  • Like
Reactions: Dresdenboy
Mar 10, 2006
11,715
2,012
126
Sure. You can bucket CPU bugs into logical and circuit bugs.

Logical bugs are in the RTL itself, so no amount of frequency slowing or voltage increasing can fix it. For example, FDIV, and more recently TSX. Those bugs are fixed by a new stepping, or a workaround in the microcode, or worst case, completely disabling the affected feature.

Circuit bugs are when the circuit does not behave correctly as the manufacturer said it should within defined parameters (voltage, frequency, temperature, etc). Like some part is sold at 3ghz, but there is a workload that fails at 3ghz, but you downclock to 2.8ghz and it works fine. Lots of things can cause this (too many to talk about here).

AFAIK (but don't take my word for this, I don't write errata) the typical terminology for logic bugs is "under some corner case", whereas a circuit bug would be "under some workload".

That was a really helpful explanation. Thank you!
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
By microcode update or how? And how will it be deployed? Any details?

Yes microcode, bios updates and windows update. (Mcupdate_GenuineIntel.dll and mcupdate_AuthenticAMD.dll file in Windows).
 
Last edited:

VirtualLarry

No Lifer
Aug 25, 2001
56,325
10,034
126
Wonder if this will fix my image glitch issues in Waterfox? It is compiled with the Intel compiler, with extensive Intel optimizations. Including AVX, although it shouldn't be utilizing an AVX codepath on the Pentium G4400, because it lacks those instructions.
 

coercitiv

Diamond Member
Jan 24, 2014
6,183
11,837
136
If this is the case, Skylake users in this forum might try to provoke such errors by applying slightly lower voltages or higher frequencies.
I'm a bit confused:

  • People reporting the bug on Intel forum stated repeatedly this is voltage/frequency independent.
  • Why try to replicate the bug using a potentially unstable configuration to begin with?
  • Does the fix involve raising voltages in any way?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
554
136
citavia.blog.de
I'm a bit confused:

  • People reporting the bug on Intel forum stated repeatedly this is voltage/frequency independent.
  • Why try to replicate the bug using a potentially unstable configuration to begin with?
  • Does the fix involve raising voltages in any way?
Do you mean stock vs. OC? That doesn't mean that Woltman's highly optimized codes couldn't exceed stock voltage guard bands.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
What makes you say that?

Because they very rarely tend to affect performance. And there is no information about that it will affect performance. So why do you think it will affect performance?

AMD and Intel fixes bugs all the time, yet performance is unchanged.
 

coercitiv

Diamond Member
Jan 24, 2014
6,183
11,837
136
Do you mean stock vs. OC? That doesn't mean that Woltman's highly optimized codes couldn't exceed stock voltage guard bands.
And how exactly is undervolting supposed to reveal that a certain code path manages to exceed stock voltage guard bands?
 

Nothingness

Platinum Member
Jul 3, 2013
2,396
731
136
Because they very rarely tend to affect performance. And there is no information about that it will affect performance. So why do you think it will affect performance?
Contrary to you, I don't think it will or it won't, because Intel didn't say anything one way or the other ;)

AMD and Intel fixes bugs all the time, yet performance is unchanged.
Don't you remember AMD TLB fix?