Intel Skylake/Kaby Lake processors: Broken Hyper-Threading?

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Elixer

Lifer
May 7, 2002
10,371
762
126
Well, at least microcode can fix it... hopefully without performance regressions.

Just benchmark everything again! ;)
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,576
126
Few if any people have ever gotten the chips to misbehave with normal workloads and testing, apparently...

https://www.extremetech.com/computi...estabilize-intel-cpus-based-kaby-lake-skylake


Intel has apparently issued microcode updates for Skylake and Kaby Lake processors to address this problem, but they’ll need to be integrated into motherboard UEFI to work effectively. We recommend checking for board updates to see if there’s an update available if you’re using a Skylake or Kaby Lake chip.

For the record, while ExtremeTech believes Intel that these errata exist, we are not aware of any software programs affected by them and have not observed any issues with our Skylake or Kaby Lake testbeds. Our Core i7-7700K and Core i7-6700K both performed flawlessly when tested in our benchmark suites over the past six months.

If you’re having a specific problem with a piece of software that cropped up once you moved to Kaby Lake or Skylake, we recommend shutting off Hyper-Threading and seeing if that resolves the problem. Hopefully motherboard manufacturers will have solutions ready to go sooner rather than later.
 
  • Like
Reactions: Drazick

Ajay

Lifer
Jan 8, 2001
16,094
8,114
136
Basically don't worry about it - all processors have errata (mistakes). Anything people design and build have mistakes in them. Go look up your automobile for instance.

Yeah, but, it's just irresistible when it comes to giving Intel a black eye, apparently. Everyone on the web can now claim that they are not beholding to Intel, even if they get free CPUs and other swag...
 

RLGL

Platinum Member
Jan 8, 2013
2,115
322
126
Not the first time over the years I have seen this type of issue requiring a repair of the cpu firmware. Both AMD and Intel have had their bugs
 

Nothingness

Diamond Member
Jul 3, 2013
3,307
2,379
136
Basically don't worry about it - all processors have errata (mistakes). Anything people design and build have mistakes in them. Go look up your automobile for instance.
Anything that can cause (silent) data corruption is a great deal for me, so I worry about it. In this case it looks like the fix is in, but that doesn't make it less a worrying issue.
 

Borealis7

Platinum Member
Oct 19, 2006
2,901
205
106
I've been using a 7700K for gaming since launch, haven't had a problem with it. i'm not sure i'll even bother updating the M/B BIOS for this.
 
  • Like
Reactions: Arachnotronic

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Anything that can cause (silent) data corruption is a great deal for me, so I worry about it. In this case it looks like the fix is in, but that doesn't make it less a worrying issue.

This wasn't about silent data corruption though, this is a blue/green screen error - a crash bug. And the case to make it happen is next impossible to cause. Even with a program specifically designed to trigger it, it's not repeatable, it only occurs sometimes. The chance of real software causing this bug to occur is minute.
 

richaron

Golden Member
Mar 27, 2012
1,357
329
136
This wasn't about silent data corruption though, this is a blue/green screen error - a crash bug. And the case to make it happen is next impossible to cause. Even with a program specifically designed to trigger it, it's not repeatable, it only occurs sometimes. The chance of real software causing this bug to occur is minute.

Yeah that's also true about many widely publicized bugs in the tech industry. This particular bug was produced by a company with obvious "fans" in this forum willing to defend and deflect the blame.

Like all those silicone bugs in the past which only occur "sometimes" and are "next to impossible to cause", the bug are still a major bug. A major bug which needed a major response, but in this case the response was amazingly quiet. Personally I have much more respect for companies who own their mistakes rather than sweep it under the rug. And that, I suspect, is why people are finally spreading this information now it has come to light.
 
  • Like
Reactions: Space Tyrant

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
Like all those silicone bugs in the past which only occur "sometimes" and are "next to impossible to cause", the bug are still a major bug. A major bug which needed a major response, but in this case the response was amazingly quiet. Personally I have much more respect for companies who own their mistakes rather than sweep it under the rug. And that, I suspect, is why people are finally spreading this information now it has come to light.

"Major bug" - No it's not. As proven by the fact that it's not getting a silicon fix.

I'd really like to know why you feel this was "swept under the rug". It was disclosed the same way every processor maker has disclosed bugs since forever.

What brought it to light was somebody that had no idea what they were talking about made a post on a mailing list, and it was picked up on the net for the clicks.
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,576
126
Yeah that's also true about many widely publicized bugs in the tech industry. This particular bug was produced by a company with obvious "fans" in this forum willing to defend and deflect the blame.

Like all those silicone bugs in the past which only occur "sometimes" and are "next to impossible to cause", the bug are still a major bug. A major bug which needed a major response, but in this case the response was amazingly quiet. Personally I have much more respect for companies who own their mistakes rather than sweep it under the rug. And that, I suspect, is why people are finally spreading this information now it has come to light.
Yes, that's why Intel and AMD publish chip errata...to hide them...
 

richaron

Golden Member
Mar 27, 2012
1,357
329
136
"Major bug" - No it's not. As proven by the fact that it's not getting a silicon fix.

I'd really like to know why you feel this was "swept under the rug". It was disclosed the same way every processor maker has disclosed bugs since forever.

What brought it to light was somebody that had no idea what they were talking about made a post on a mailing list, and it was picked up on the net for the clicks.

I guess we have different ideas of a "major" bug. I consider it a major bug if the computer crashes running supposedly supported routines, and that's exactly what is happening here. It's a major bug because the CPU is fundamentally broken.

It doesn't matter if it's a rare occurrence or if they've already put out a hack to fix it. This major bug should have just as much exposure as those in the past.

Plus it shouldn't be a surprise that it took until an open source software distribution became aware of the problem that the same problem also found the light of day.
 

Nothingness

Diamond Member
Jul 3, 2013
3,307
2,379
136
This wasn't about silent data corruption though, this is a blue/green screen error - a crash bug. And the case to make it happen is next impossible to cause. Even with a program specifically designed to trigger it, it's not repeatable, it only occurs sometimes. The chance of real software causing this bug to occur is minute.
It does cause data corruption without crashes and it did happen enough that it triggered some bug report in OCaml DB. Please read this:

https://lists.debian.org/debian-devel/2017/06/msg00308.html
https://caml.inria.fr/mantis/view.php?id=7452

I'm not arguing this is catastrophic but I find this kind of bug worrying.
 

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
It does cause data corruption without crashes and it did happen enough that it triggered some bug report in OCaml DB. Please read this:

https://lists.debian.org/debian-devel/2017/06/msg00308.html
https://caml.inria.fr/mantis/view.php?id=7452

I'm not arguing this is catastrophic but I find this kind of bug worrying.

If You actually took care to read those:

One important point is that the code pattern that triggered the issue in
OCaml was present on gcc-generated code. There were extra constraints
being placed on gcc by OCaml, which would explain why gcc apparently
rarely generates this pattern.
.

Translated: we forced GCC to generate bs code ( and mixing low register accesses with full register access is bad idea for long time ). I find it hard to believe there is other code in the wild.

But if You like to worry about CPU errata ( and don't mind if it is not Intel's CPU):

https://community.amd.com/thread/215773

Here, a full, unresolved, potentially every single sold Ryzen CPU affecting Errata. Actually one that can bite every Linux user out there - imagine if some Debian/Ubuntu etc compile farm on Ryzen generates subtly bad code for distribution that crashes some nuclear reactor running Linux. Potential to worry is huge!

Irony aside, please stop trying to blow this errata out of proportion.
 
  • Like
Reactions: beginner99

Phynaz

Lifer
Mar 13, 2006
10,140
819
126
It doesn't matter if it's a rare occurrence or if they've already put out a hack to fix it. This major bug should have just as much exposure as those in the past.

It has had as much exposure as any other bug. It was exposed they exact same way AMD and Intel have been disclosing bugs for decades. Just because you haven't heard of them in the sky is falling posts doesn't mean they don't occur. Have you read any of the processor revision documentation that either LTC8K6 or I have linked to?

Here's one for AMD:
http://support.amd.com/TechDocs/55370_Rev_Guide_For_Family_15h_Models_70h-7Fh_Processors.pdf
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,576
126
I guess we have different ideas of a "major" bug. I consider it a major bug if the computer crashes running supposedly supported routines, and that's exactly what is happening here. It's a major bug because the CPU is fundamentally broken.

It doesn't matter if it's a rare occurrence or if they've already put out a hack to fix it. This major bug should have just as much exposure as those in the past.

Plus it shouldn't be a surprise that it took until an open source software distribution became aware of the problem that the same problem also found the light of day.
By your definition, every chip ever made by any mfg has major bugs, and is fundamentally broken, and all of these major bugs should be written about extensively.
That's going to take a lot of time and a lot of writing.
 

Nothingness

Diamond Member
Jul 3, 2013
3,307
2,379
136
If You actually took care to read those:



Translated: we forced GCC to generate bs code ( and mixing low register accesses with full register access is bad idea for long time ). I find it hard to believe there is other code in the wild.
Go read the case in the OCaml report. It was production code, so I don't care how they use gcc, it just happens in real production code. So we have at least one instance of such code. Can you prove it's rare? You can't, even if it obviously is rare.

But if You like to worry about CPU errata ( and don't mind if it is not Intel's CPU):

https://community.amd.com/thread/215773

Here, a full, unresolved, potentially every single sold Ryzen CPU affecting Errata. Actually one that can bite every Linux user out there - imagine if some Debian/Ubuntu etc compile farm on Ryzen generates subtly bad code for distribution that crashes some nuclear reactor running Linux. Potential to worry is huge!
Are you one of those people who only are here to put fire on AMD vs Intel discussions. I'm not interested in this. So please go play your child game with other people.

Irony aside, please stop trying to blow this errata out of proportion.
I'm not doing that. I'm just saying people downplaying such a bug are clueless. No matter what CPU has it, AMD or Intel.