Skylake AVX bug

Dresdenboy

Golden Member
Jul 28, 2003
1,730
0
136
citavia.blog.de
#1
See https://communities.intel.com/thread/96157?start=0&tstart=0

It appears when running Prime95 on specific FFT sizes, while it's forced to use AVX instead of AVX2/FMA3. As it's written in the thread above, FMA3 has to be disabled.

As there will be a BIOS fix, it could be power management related (voltage/frequency) as this might cause calculation errors. If this is the case, Skylake users in this forum might try to provoke such errors by applying slightly lower voltages or higher frequencies.
 

Techhog

Platinum Member
Sep 11, 2013
2,834
0
26
#2
This has been known and has existed since Haswell. It's a design flaw, not a bug.
 

zir_blazer

Senior member
Jun 6, 2013
870
5
81
#3
This has been known and has existed since Haswell. It's a design flaw, not a bug.
Check the comments. The guys that discovered it claims that it doesn't happen on previous platforms, its Skylake only. And Intel themselves says in the second page that they were able to reproduce the bug and are preparing a fix.
 
Apr 22, 2012
20,395
0
106
#4
BIOS(microcode) fixes should come this month. Perhaps MS will distribute it as well with Windows.
 
Mar 10, 2006
11,719
122
126
#5
Bummer. Let's hope the microcode update doesn't nerf performance...
 

Brunnis

Senior member
Nov 15, 2004
490
12
91
#7
I've been following this for the past week. As said, let's hope the microcode update doesn't affect performance in any significant way. Even if it does have an effect, I doubt it would be of any significance to "general usage".

This has been known and has existed since Haswell. It's a design flaw, not a bug.
As said, it's not present in Haswell (at least not to anyone's knowledge). Besides, in what case is it not a to be considered a bug when a certain instruction mix causes the CPU to halt execution?
 

krumme

Diamond Member
Oct 9, 2009
5,744
129
136
#8
When its a feature?
 
Apr 22, 2012
20,395
0
106
#9
The bug is already fixed 5 days ago in case anyone wonders.
 

dmens

Golden Member
Mar 18, 2005
1,841
19
81
#10
From the thread it *sounds* like a circuit marginality issue: "under certain complex workload conditions"

The fix should not affect performance.
 
Mar 10, 2006
11,719
122
126
#11
From the thread it *sounds* like a circuit marginality issue: "under certain complex workload conditions"

The fix should not affect performance.
Could you, or somebody else, explain in more detail what this means exactly? I would greatly appreciate it. Thanks!
 

dmens

Golden Member
Mar 18, 2005
1,841
19
81
#12
Could you, or somebody else, explain in more detail what this means exactly? I would greatly appreciate it. Thanks!
Sure. You can bucket CPU bugs into logical and circuit bugs.

Logical bugs are in the RTL itself, so no amount of frequency slowing or voltage increasing can fix it. For example, FDIV, and more recently TSX. Those bugs are fixed by a new stepping, or a workaround in the microcode, or worst case, completely disabling the affected feature.

Circuit bugs are when the circuit does not behave correctly as the manufacturer said it should within defined parameters (voltage, frequency, temperature, etc). Like some part is sold at 3ghz, but there is a workload that fails at 3ghz, but you downclock to 2.8ghz and it works fine. Lots of things can cause this (too many to talk about here).

AFAIK (but don't take my word for this, I don't write errata) the typical terminology for logic bugs is "under some corner case", whereas a circuit bug would be "under some workload".
 
Mar 10, 2006
11,719
122
126
#13
Sure. You can bucket CPU bugs into logical and circuit bugs.

Logical bugs are in the RTL itself, so no amount of frequency slowing or voltage increasing can fix it. For example, FDIV, and more recently TSX. Those bugs are fixed by a new stepping, or a workaround in the microcode, or worst case, completely disabling the affected feature.

Circuit bugs are when the circuit does not behave correctly as the manufacturer said it should within defined parameters (voltage, frequency, temperature, etc). Like some part is sold at 3ghz, but there is a workload that fails at 3ghz, but you downclock to 2.8ghz and it works fine. Lots of things can cause this (too many to talk about here).

AFAIK (but don't take my word for this, I don't write errata) the typical terminology for logic bugs is "under some corner case", whereas a circuit bug would be "under some workload".
That was a really helpful explanation. Thank you!
 
Apr 22, 2012
20,395
0
106
#15
By microcode update or how? And how will it be deployed? Any details?
Yes microcode, bios updates and windows update. (Mcupdate_GenuineIntel.dll and mcupdate_AuthenticAMD.dll file in Windows).
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
3,395
0
76
#16
Ok, thanks. I wonder if that will result in any performance penalty.
 
Aug 25, 2001
43,587
535
126
#18
Wonder if this will fix my image glitch issues in Waterfox? It is compiled with the Intel compiler, with extensive Intel optimizations. Including AVX, although it shouldn't be utilizing an AVX codepath on the Pentium G4400, because it lacks those instructions.
 

coercitiv

Diamond Member
Jan 24, 2014
3,187
468
136
#20
If this is the case, Skylake users in this forum might try to provoke such errors by applying slightly lower voltages or higher frequencies.
I'm a bit confused:

  • People reporting the bug on Intel forum stated repeatedly this is voltage/frequency independent.
  • Why try to replicate the bug using a potentially unstable configuration to begin with?
  • Does the fix involve raising voltages in any way?
 

Dresdenboy

Golden Member
Jul 28, 2003
1,730
0
136
citavia.blog.de
#21
I'm a bit confused:

  • People reporting the bug on Intel forum stated repeatedly this is voltage/frequency independent.
  • Why try to replicate the bug using a potentially unstable configuration to begin with?
  • Does the fix involve raising voltages in any way?
Do you mean stock vs. OC? That doesn't mean that Woltman's highly optimized codes couldn't exceed stock voltage guard bands.
 
Apr 22, 2012
20,395
0
106
#22
What makes you say that?
Because they very rarely tend to affect performance. And there is no information about that it will affect performance. So why do you think it will affect performance?

AMD and Intel fixes bugs all the time, yet performance is unchanged.
 

coercitiv

Diamond Member
Jan 24, 2014
3,187
468
136
#23
Do you mean stock vs. OC? That doesn't mean that Woltman's highly optimized codes couldn't exceed stock voltage guard bands.
And how exactly is undervolting supposed to reveal that a certain code path manages to exceed stock voltage guard bands?
 

Nothingness

Golden Member
Jul 3, 2013
1,884
32
106
#24
Because they very rarely tend to affect performance. And there is no information about that it will affect performance. So why do you think it will affect performance?
Contrary to you, I don't think it will or it won't, because Intel didn't say anything one way or the other ;)

AMD and Intel fixes bugs all the time, yet performance is unchanged.
Don't you remember AMD TLB fix?
 
Thread starter Similar threads Forum Replies Date
O CPUs and Overclocking 2

Similar threads



ASK THE COMMUNITY

TRENDING THREADS