Massive security hole in CPU's incoming?Official Meltdown/Spectre Discussion Thread

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
http://pythonsweetness.tumblr.com/post/169166980422/the-mysterious-case-of-the-linux-page-table

TLDR;

Copying from the thread on 4chan

There is evidence of a massive Intel CPU hardware bug (currently under embargo) that directly affects big cloud providers like Amazon and Google. The fix will introduce notable performance penalties on Intel machines (30-35%).

People have noticed a recent development in the Linux kernel: a rather massive, important redesign (page table isolation) is being introduced very fast for kernel standards... and being backported! The "official" reason is to incorporate a mitigation called KASLR... which most security experts consider almost useless. There's also some unusual, suspicious stuff going on: the documentation is missing, some of the comments are redacted (https://twitter.com/grsecurity/status/947147105684123649

) and people with Intel, Amazon and Google emails are CC'd.

According to one of the people working on it, PTI is only needed for Intel CPUs, AMD is not affected by whatever it protects against (https://lkml.org/lkml/2017/12/27/2). PTI affects a core low-level feature (virtual memory) and has severe performance penalties: 29% for an i7-6700 and 34% for an i7-3770S, according to Brad Spengler from grsecurity. PTI is simply not active for AMD CPUs. The kernel flag is named X86_BUG_CPU_INSECURE and its description is "CPU is insecure and needs kernel page table isolation".

Microsoft has been silently working on a similar feature since November: https://twitter.com/aionescu/status/930412525111296000

People are speculating on a possible massive Intel CPU hardware bug that directly opens up serious vulnerabilities on big cloud providers which offer shared hosting (several VMs on a single host), for example by letting a VM read from or write to another one.

Quoted from a reddit thread.

This could be big. Many a sysadmin might have sleepless nights soon enough.

EDIT: Since news and clarification arrived, I'll add it here.
Official website with details: https://meltdownattack.com
TL;DR
There are two attacks exploiting similar ideas, called Meltdown and Spectre.

Meltdown affects all Intel CPU's going back a decade, and some select ARM CPU's. It is the more pressing issue of the two, and potentially compromises systems completely due to its power. Patches already went out on both Linux and Windows to mitigate it. Performance hit depends on workload, gaming not noticeably affected.

Spectre affects all CPU's aside from specialized microcontrollers and other low powered devices. It is harder to exploit but also harder to fix. The full consequences and effects of it are still unknown, but all major tech companies are taking steps to research and mitigate it.

Intel Press Release: https://newsroom.intel.com/news/intel-responds-to-security-research-findings/

AMD Press Release: https://www.amd.com/en/corporate/speculative-execution

Apple Press Release: https://support.apple.com/en-us/HT208394

ARM Press Release: https://developer.arm.com/support/security-update



Updated title of the thread to include other CPU companies.


esquared
Anandtech Forum Director
 
Last edited by a moderator:

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Well, it is over 49% for for processors from both manufacturers. It is just an implementation that alters the way page table walks are handled.
Causing the slow down.
Nothing todo with xeon or epic specific.
The slow down happens if you apply the patch on both. Thing is, EPYC isn't affected by the bug it's fixing, therefore it doesn't need to get slowed down.
 
May 11, 2008
19,306
1,131
126
Does it have anything to do with the process-context identifiers (PCID) that Intel supports ?
AMD has ASID , but that is all that i read about it.
 
  • Like
Reactions: tamima

jonijs

Junior Member
Jan 2, 2018
3
11
36
Does it have anything to do with the process-context identifiers (PCID) that Intel supports ?
AMD has ASID , but that is all that i read about it.
https://lkml.org/lkml/2017/12/27/2
AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against. The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.

Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
is set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 

bbhaag

Diamond Member
Jul 2, 2011
6,606
1,991
146
Not that I don't believe what you're saying but as a general rule of thumb I never trust any post that starts out with "Copying from the thread on 4chan". So I'll wait and see if more official news breaks out with this story.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
Not that I don't believe what you're saying but as a general rule of thumb I never trust any post that starts out with "Copying from the thread on 4chan". So I'll wait and see if more official news breaks out with this story.
It was a summary of a longer article that is the first thing I linked, above the quote. You can read it yourself, including all the sources.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I do not see, why AMD would not be affected. The attacks rely on runtime-difference of page table walks depending on if the page is mapped or not. This way an attacker could potentially figure out the page mappings for the kernel even though KASLR is in place.
However there must be runtime differences for AMD as well unless it can figure out if a page is mapped before the pagetable walk - which i think is impossible.

In addition i do not consider this a processor bug at all, though it could be argued that ARMs TTBR0/TTBR1 splitting is more effective than x86s single CR3 for mapping user and kernel address space. In any case its not a bug but an architectural weakness.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
I do not see, why AMD would not be affected. The attacks rely on runtime-difference of page table walks depending on if the page is mapped or not. This way an attacker could potentially figure out the page mappings for the kernel even though KASLR is in place.
However there must be runtime differences for AMD as well unless it can figure out if a page is mapped before the pagetable walk - which i think is impossible.

In addition i do not consider this a processor bug at all, though it could be argued that ARMs TTBR0/TTBR1 splitting is more effective than x86s single CR3 for mapping user and kernel address space. In any case its not a bug but an architectural weakness.
Supposedly it's down to speculative execution. Intel's uArch does not respect privileges, potentially exposing privileged code. AMD doesn't do this.
 
May 11, 2008
19,306
1,131
126
https://lkml.org/lkml/2017/12/27/2
AMD processors are not subject to the types of attacks that the kernel
page table isolation feature protects against. The AMD microarchitecture
does not allow memory references, including speculative references, that
access higher privileged data when running in a lesser privileged mode
when that access would result in a page fault.

Disable page table isolation by default on AMD processors by not setting
the X86_BUG_CPU_INSECURE feature, which controls whether X86_FEATURE_PTI
is set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---

Ah, thank you.
 
  • Like
Reactions: Emraan931

FIVR

Diamond Member
Jun 1, 2016
3,753
911
106
I bet HPE is going to get orders pouring in for EPYC over this. People who bought Skylake-SP servers and now have to choose between being secure and having a 40% reduction in performance are not going to be pleased.


Anybody with legal knowledge have any idea if there will be lawsuits over this?
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
I bet HPE is going to get orders pouring in for EPYC over this. People who bought Skylake-SP servers and now have to choose between being secure and having a 40% reduction in performance are not going to be pleased.


Anybody with legal knowledge have any idea if there will be lawsuits over this?

Bug is under NDA and wasn't previously known, as long as MS/Intel/Other cloud providers come clean on the affects of it and so on, I can't imagine there will be lawsuits, though legal stuff really isn't my thing so I have no idea.
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
I bet HPE is going to get orders pouring in for EPYC over this. People who bought Skylake-SP servers and now have to choose between being secure and having a 40% reduction in performance are not going to be pleased.


Anybody with legal knowledge have any idea if there will be lawsuits over this?
Probably not. As long as it wasn't a known issue until recently.
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,220
3,801
75
What performance is reduced by this? Just page file swaps? All RAM access? And I see it affects more than just Xeons. Will this hit Windows too?
 

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
What performance is reduced by this? Just page file swaps? All RAM access? And I see it affects more than just Xeons. Will this hit Windows too?
Hard to know exactly as details are still under NDA. As for Windows, supposedly it's getting a Kernel patch too, so it's not a Linux only thing.
 
  • Like
Reactions: lightmanek

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Supposedly it's down to speculative execution. Intel's uArch does not respect privileges, potentially exposing privileged code. AMD doesn't do this.

Sure, but exploiting speculative execution is only one type of attack, while there are many other type of attacks where KAISER/PTI solution removes the vulnerability. As example the JavaScript attack as linked by William Gaatjes above has nothing to do with speculative execution. Also the double-fault attack as described in the KAISER paper has nothing to do with speculative execution.

What performance is reduced by this? Just page file swaps? All RAM access? And I see it affects more than just Xeons. Will this hit Windows too?

Performance is only reduced for context switches when going from and to kernel mode. So depending on how many kernel calls the application is doing the higher the impact.
 
  • Like
Reactions: IEC and Ken g6

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
Dave Hansen said:
KAISER will affect performance for anything that does system calls or interrupts: everything. Just the new instructions (CR3 manipulation) add a few hundred cycles to a syscall or interrupt. Most workloads that we have run show single-digit regressions. 5% is a good round number for what is typical. The worst we have seen is a roughly 30% regression on a loopback networking test that did a ton of syscalls and context switches.
From this patchset back in Nov