Massive security hole in CPU's incoming?Official Meltdown/Spectre Discussion Thread

csbin · Jan 4, 2018

The Register： We translated Intel's crap attempt to spin its way out of CPU security bug PR nightmare

As Linus Torvalds lets rip on Chipzilla

http://www.theregister.co.uk/2018/01/04/intel_meltdown_spectre_bugs_the_registers_annotations/

Translation: When malware steals your stuff, your Intel chip is working as designed. Also, this is why our stock price fell. Please make other stock prices fall, thank you.

Translation: Pleeeeeease, pleeeeease do not sue us for shipping faulty products or make us recall millions of chips.

Translation: We weren't the only one. And if we're going down, we're taking every last one of you with us.

Translation: Just fucking leave us alone.

Translation: We were gonna say something next week, but those bastards at The Register blew the lid on it early so, uh, so, fake news! Fake news! NO PUPPET!

Translation: Don't click on any bad links or emails, you rubes. Thanks for that. And remember to lock your doors at night.

Translation: Who else are you gonna buy this stuff from?

Paratus · Jan 4, 2018

kyubi said:
Confirmed all generation s are vulnerable? This slow down fix will push my migration to ryzen alot sooner

I’ve got to admit I’m feeling pretty good about having an unopened Threadripper and X399 sitting in my study ready to replace my old i7 920 / X58 PC.

Phynaz · Jan 4, 2018

bryanW1995 said:
You seem awfully defensive about all of this, and you're reading "near-zero" as "may be affected"...interesting.

So what you are saying is we should believe AMD's statement, but not Intel's or the researchers. How about ARM, should we believe their statements? Nvidia?

urvile · Jan 4, 2018

I find it interesting that AMD wasn't working with the NSA as well. Although it does make sense when you consider their market share.

Phynaz · Jan 4, 2018

beginner99 said:
The Intel social media PR team hard at work. Hint: You aren't doing any better than the ridiculous official Intel PR statement. Tell your bosses to stop making their company look like fools.

Ah yes, attack the poster...

Paratus · Jan 4, 2018

Phynaz said:
So what you are saying is we should believe AMD's statement, but not Intel's or the researchers. How about ARM, should we believe their statements?

Seems more like you’d like to avoid believing AMDs statements about their own processors in favor of Intels negative statements about AMD processors.

Wonder why that is?

Atari2600 · Jan 4, 2018

Paratus said:
In the human spaceflight business we would call that “links in the error chain”. Many links can end up being attributed to budget and schedule pressure.

People also tend to stop questioning design and process risks if it appears to have worked out in the past.

I don’t see why the chip business isn’t susceptible to the same problems.

In the commercial aerospace business we would call it more project manager/beancounter bullsh... shortcuts.

Doing something right takes time and money. Not offshoring it to a bunch of guys that can seemingly work on the problem - but often lack the knowledge or experience to put it all in context and understand what is happening, and maybe more importantly what isn't happening that should be.

So then you end up getting a load of crap dumping on you with no time or resource to fix the mess right - and a mad race to find shortcuts ensue.

urvile · Jan 4, 2018

Paratus said:
Seems more like you’d like to avoid believing AMDs statements about their own processors in favor of Intels negative statements about AMD processors.

Wonder why that is?

Of course phynaz does. Why would you want to believe that intel are the only ones working with the NSA to introduce vulnerabilities into their architecture?

Not that I know anything about security mind you. I just find something this wide spread a little curious.

Atari2600 · Jan 4, 2018

If Microsoft were to apply the patch to both AMD and Intel equally - would they be open to legal action by AMD if they refused to add the 2 lines of code in that it would take to qualify patch applicability?

William Gaatjes · Jan 4, 2018

This has got to be the most detailed and easy to understand explanation ever.

https://arstechnica.com/gadgets/2018/01/whats-behind-the-intel-design-flaw-forcing-numerous-patches/

Keeping track of addresses
Every byte of memory in a system is implicitly numbered, those numbers being each byte's address. The very earliest operating systems operated using physical memory addresses, but physical memory addresses are inconvenient for lots of reasons. For example, there are often gaps in the addresses, and (particularly on 32-bit systems), physical addresses can be awkward to manipulate, requiring 36-bit numbers, or even larger ones.

Accordingly, modern operating systems all depend on a broad concept called virtual memory. Virtual memory systems allow both programs and the kernels themselves to operate in a simple, clean, uniform environment. Instead of the physical addresses with their gaps and other oddities, every program, and the kernel itself, uses virtual addresses to access memory. These virtual addresses are contiguous—no need to worry about gaps—and sized conveniently to make them easy to manipulate. 32-bit programs see only 32-bit addresses, even if the physical address requires 36-bit or more numbering.

While this virtual addressing is transparent to almost every piece of software, the processor does ultimately need to know which physical memory a virtual address refers to. There's a mapping from virtual addresses to physical addresses, and that's stored in a large data structure called a page table. Operating systems build the page table, using a layout determined by the processor, and the processor and operating system in conjunction use the page table whenever they need to convert between virtual and physical addresses.

This whole mapping process is so important and fundamental to modern operating systems and processors that the processor has dedicated cache—the translation lookaside buffer, or TLB—that stores a certain number of virtual-to-physical mappings so that it can avoid using the full page table every time.

The use of virtual memory gives us a number of useful features beyond the simplicity of addressing. Chief among these is that each individual program is given its own set of virtual addresses, with its own set of virtual to physical mappings. This is the fundamental technique used to provide "protected memory;" one program cannot corrupt or tamper with the memory of another program, because the other program's memory simply isn't part of the first program's mapping.

But these uses of an individual mapping per process, and hence extra page tables, puts pressure on the TLB cache. The TLB isn't very big—typically a few hundred mappings in total—and the more page tables a system uses, the less likely it is that the TLB will include any particular virtual-to-physical translation.

Half and half
To make the best use of the TLB, every mainstream operating system splits the range of virtual addresses into two. One half of the addresses is used for each program; the other half is used for the kernel. When switching between processes, only half the page table entries change—the ones belonging to the program. The kernel half is common to every program (because there's only one kernel), and so it can use the same page table mapping for every process. This helps the TLB enormously; while it still has to discard mappings belonging to the process' half of memory addresses, it can keep the mappings for the kernel's half.

This design isn't completely set in stone. Work was done on Linux to make it possible to give a 32-bit process the entire range of addresses, with no sharing between the kernel's page table and that of each program. While this gave the programs more address space, it carried a performance cost, because the TLB had to reload the kernel's page table entries every time kernel code needed to run. Accordingly, this approach was never widely used on x86 systems.

One downside of the decision to split the virtual address space between the kernel and each program is that the memory protection is weakened. If the kernel had its own set of page tables and virtual addresses, it would be afforded the same protection as different programs have from one another; the kernel's memory would be simply invisible. But with the split addressing, user programs and the kernel use the same address range, and, in principle, a user program would be able to read and write kernel memory.

To prevent this obviously undesirable situation, the processor and virtual addressing system have a concept of "rings" or "modes." x86 processors have lots of rings, but for this issue, only two are relevant: "user" (ring 3) and "supervisor" (ring 0). When running regular user programs, the processor is put into user mode, ring 3. When running kernel code, the processor is in ring 0, supervisor mode, also known as kernel mode.

These rings are used to protect the kernel memory from user programs. The page tables aren't just mapping from virtual to physical addresses; they also contain metadata about those addresses, including information about which rings can access an address. The kernel's page table entries are all marked as only being accessible to ring 0; the program's entries are marked as being accessible from any ring. If an attempt is made to access ring 0 memory while in ring 3, the processor blocks the access and generates an exception. The result of this is that user programs, running in ring 3, should not be able to learn anything about the kernel and its ring 0 memory.

At least, that's the theory. The spate of patches and update show that somewhere this has broken down. This is where the big mystery lies.

Moving between rings
Here's what we do know. Every modern processor performs a certain amount of speculative execution. For example, given some instructions that add two numbers and then store the result in memory, a processor might speculatively do the addition before ascertaining whether the destination in memory is actually accessible and writeable. In the common case, where the location is writeable, the processor managed to save some time, as it did the arithmetic in parallel with figuring out what the destination in memory was. If it discovers that the location isn't accessible—for example, a program trying to write to an address that has no mapping and no physical location at all—then it will generate an exception and the speculative execution is wasted.

Intel processors, specifically—though not AMD ones—allow speculative execution of ring 3 code that writes to ring 0 memory. The processors do properly block the write, but the speculative execution minutely disturbs the processor state, because certain data will be loaded into cache and the TLB in order to ascertain whether the write should be allowed. This in turn means that some operations will be a few cycles quicker, or a few cycles slower, depending on whether their data is still in cache or not. As well as this, Intel's processors have special features, such as the Software Guard Extensions (SGX) introduced with Skylake processors, that slightly change how attempts to access memory are handled. Again, the processor does still protect ring 0 memory from ring 3 programs, but again, its caches and other internal state are changed, creating measurable differences.

What we don't know, yet, is just how much kernel memory information can be leaked to user programs or how easily that leaking can occur. And which Intel processors are affected? Again it's not entirely clear, but indications are that every Intel chip with speculative execution (which is all the mainstream processors introduced since the Pentium Pro, from 1995) can leak information this way.

The first wind of this problem came from researchers from Graz Technical University in Austria. The information leakage they discovered was enough to undermine kernel mode Address Space Layout Randomization (kernel ASLR, or KASLR). ASLR is something of a last-ditch effort to prevent the exploitation of buffer overflows. With ASLR, programs and their data are placed at random memory addresses, which makes it a little harder for attackers to exploit security flaws. KASLR applies that same randomization to the kernel so that the kernel's data (including page tables) and code are randomly located.

The Graz researchers developed KAISER, a set of Linux kernel patches to defend against the problem.

If the problem were just that it enabled the derandomization of ASLR, this probably wouldn't be a huge disaster. ASLR is a nice protection, but it's known to be imperfect. It's meant to be a hurdle for attackers, not an impenetrable barrier. The industry reaction—a fairly major change to both Windows and Linux, developed with some secrecy—suggests that it's not just ASLR that's defeated and that a more general ability to leak information from the kernel has been developed. Indeed, researchers have started to tweet that they're able to leak and read arbitrary kernel data. Another possibility is that the flaw can be used to escape out of a virtual machine and compromise a hypervisor.

The solution that both the Windows and Linux developers have picked is substantially the same, and derived from that KAISER work: the kernel page table entries are no longer shared with each process. In Linux, this is called Kernel Page Table Isolation (KPTI).

With the patches, the memory address is still split in two; it's just the kernel half is almost empty. It's not quite empty, because a few kernel pieces need to be mapped permanently, whether the processor is running in ring 3 or ring 0, but it's close to empty. This means that even if a malicious user program tries to probe kernel memory and leak information, it will fail—there's simply nothing to leak. The real kernel page tables are only used when the kernel itself is running.

This undermines the very reason for the split address space in the first place. The TLB now needs to clear out any entries related to the real kernel page tables every time it switches to a user program, putting an end to the performance saving that splitting enabled.

The impact of this will vary depending on the workload. Every time a program makes a call into the kernel—to read from disk, to send data to the network, to open a file, and so on—that call will be a little more expensive, since it will force the TLB to be flushed and the real kernel page table to be loaded. Programs that don't use the kernel much might see a hit of perhaps 2-3 percent—there's still some overhead because the kernel always has to run occasionally, to handle things like multitasking.

But workloads that call into the kernel a ton will see much greater performance drop off. In a benchmark, a program that does virtually nothing other than call into the kernel saw its performance drop by about 50 percent; in other words, each call into the kernel took twice as long with the patch than it did without. Benchmarks that use Linux's loopback networking also see a big hit, such as 17 percent in this Postgres benchmark. Real database workloads using real networking should see lower impact, because with real networks, the overhead of calling into the kernel tends to be dominated by the overhead of using the actual network.

While Intel systems are the ones known to have the defect, they may not be the only ones affected. Some platforms, such as SPARC and IBM's S390, are immune to the problem, as their processor memory management doesn't need the split address space and shared kernel page tables; operating systems on those platforms have always isolated their kernel page tables from user mode ones. But others, such as ARM, may not be so lucky; comparable patches for ARM Linux are under development.

TigerMonsoonDragon · Jan 4, 2018

Paratus said:
I’ve got to admit I’m feeling pretty good about having an unopened Threadripper and X399 sitting in my study ready to replace my old i7 920 / X58 PC.

Lol nice

I was gonna hold out for ryzen next year 2019 but looks like I'ma do it another sooner. Prolly turn my Xeon rig into htpc

jihe · Jan 4, 2018

Well the patch is out. Anyone actually doing some benchmarks to see the performance hit?

Shivansps · Jan 4, 2018

Atari2600 said:
If Microsoft were to apply the patch to both AMD and Intel equally - would they be open to legal action by AMD if they refused to add the 2 lines of code in that it would take to qualify patch applicability?

The thing is if AMD cant probe their systems are 100% exploit free Microsoft will just patch everyone. The Meltdown paper makes clear that every OoO cpu with speculative execution may be affected either now or in the near future by another variation of the attack.

richaron · Jan 4, 2018

Shivansps said:
The thing is if AMD cant probe their systems are 100% exploit free Microsoft will just patch everyone. The Meltdown paper makes clear that every OoO cpu with speculative execution may be affected either now or in the near future by another variation of the attack.

That's not a thing. Anyone who subscribes to the scientific method realises proof is difficult. No real scientist will say 100% earth will not be destroyed tomorrow.

But they will say the next best thing; which is what they say about AMD's chance of being susceptible to Meltdown.

Phynaz · Jan 4, 2018

One of my systems received the Windows patch this morning.

Shivansps · Jan 4, 2018

richaron said:
That's not a thing. Anyone who subscribes to the scientific method realises proof is difficult. No real scientist will say 100% earth will not be destroyed tomorrow.

But they will say the next best thing; which is what they say about AMD's chance of being susceptible to Meltdown.

So why take the chance? This flaw goes way beyond some Intel hadware bug, its related to how OoO and speculative execution was designed and implemented, and that dates way back to the 90s.

Having to push another severe fix like this later because someone managed to make it work on AMD cpus means another hard reset of servers for everyone. No to mention that AMD is already vulnerable to Spectre. AMD is pounding his chest on that "no one managed to show the Type 2 nd 3 on ours cpus yet" And thats really not convincing at all.

bryanW1995 · Jan 4, 2018

Phynaz said:
So what you are saying is we should believe AMD's statement, but not Intel's or the researchers. How about ARM, should we believe their statements? Nvidia?

Paratus said:
Seems more like you’d like to avoid believing AMDs statements about their own processors in favor of Intels negative statements about AMD processors.

Wonder why that is?

Looks like Paratus already responded. To add to this, I'm not saying that anybody's statement is incorrect, they could and likely are all correct. Go back and read the thread and you'll see that AMD said "near-zero chance" and addressed all 3 issues, while intel said "there's an issue we're addressing with amd and arm + others" without mentioning which of those issues they're working on with the others and which affects intel only. In the absence of a detailed Intel response like AMD's, it's reasonable to assume that the researcher's and AMD's responses are both correct.

Beer4Me · Jan 4, 2018

Truth be told: these are not exploits, but purposeful backdoors created by Intel for gov't agencies to secretly take control of systems remotely. Since it's been discovered, it has to be patched. But I'm sure Intel/Feds were hoping it was never brought to light.

richaron · Jan 4, 2018

Shivansps said:
So why take the chance?

Yeah good point. Proving a negative is difficult or impossible, so lets just assume... What? What's your point?

From all the researchers have seen is that there's basically zero chance AMD is susceptible to Meltdown. And astronomers will say there's basically zero chance an asteroid will destroy earth tomorrow. You might not understand it, but this is how responsible people who subscribe to the scientific method will word their statements.

jpiniero · Jan 4, 2018

Shivansps said:
Having to push another severe fix like this later because someone managed to make it work on AMD cpus means another hard reset of servers for everyone.

On Linux at least you can force KPTI on and off. So if it comes to it you could force it enabled on AMD cpus if needed or if you wanted it for some reason.

Phynaz · Jan 4, 2018

Paratus said:
Seems more like you’d like to avoid believing AMDs statements about their own processors in favor of Intels negative statements about AMD processors.

Wonder why that is?

I haven't read Intel's statements, as I never believe any company's PR. I read the researchers papers that state AMD processors speculatively execute instructions after an exception just like Intel's. As do ARM's processors and IBM's. As the authors said, just because they didn't exploit this on an AMD CPU doesn't mean it's not exploitable. The researchers also said they couldn't get their code to cause the condition on ARM, but ARM has stated that some of their architectures are vulnerable.

We expect several more performance optimizations in modern CPUs which affect the microarchitectural state in some way, not even necessarily through the cache. Thus, hardware which is designed to provide certain security guarantees, e.g., CPUs running untrusted code, require a redesign to avoid Meltdown- and Spectre-like attacks

I might believe AMD if they disclose what the "near zero" statement means. I'm speculating it has something to do with their earlier statements about page faults. They claim this doesn't occur in AMD CPUs when a page fault occurs. What about if a page fault doesn't occur?

bryanW1995 · Jan 4, 2018

Phynaz said:
I haven't read Intel's statements. I read the researchers papers that state AMD processors speculatively execute instructions after an exception just like Intel's. As do ARM's processors and IBM's. As the authors said, just because they didn't exploit this on an AMD CPU doesn't mean it's not exploitable. The researchers also said they couldn't get their code to cause the condition on ARM, but are has stated that some of their architectures are vulnerable.

Scroll up to post 360 and you can read arstechnia's explanation, it makes it very clear why meltdown only affects Intel (potentially every cpu since 1995 other than itanium), plus some ARM cpus.

moinmoin · Jan 4, 2018

Shivansps said:
So why take the chance? This flaw goes way beyond some Intel hadware bug, its related to how OoO and speculative execution was designed and implemented, and that dates way back to the 90s.

Having to push another severe fix like this later because someone managed to make it work on AMD cpus means another hard reset of servers for everyone. No to mention that AMD is already vulnerable to Spectre. AMD is pounding his chest on that "no one managed to show the Type 2 nd 3 on ours cpus yet" And thats really not convincing at all.

AMD chips are affected by the way OoO and speculative execution work. But by all accounts so far AMD chips are not affected by unprivileged processes gaining access to privileged data (which is what variant 2 and 3 are about). This points to Intel having neglected privilege separation in the affected areas while in AMD chips this separation appears to be respected and work as intended, thus making both Meltdown and Spectre variant 2 attacks as is not work on AMD chips.

richaron · Jan 4, 2018

Phynaz said:
I haven't read Intel's statements. I read the researchers papers that state AMD processors speculatively execute instructions after an exception just like Intel's. As do ARM's processors and IBM's. As the authors said, just because they didn't exploit this on an AMD CPU doesn't mean it's not exploitable. The researchers also said they couldn't get their code to cause the condition on ARM, but are has stated that some of their architectures are vulnerable.

You're talking about the much worse Meltdown bug which appears to only affect intel CPUs? This is the exploit the researchers tried on AMD and couldn't make it work?

I guess I'm not surprised you're still trying to pretend it affects all chips equally...

richaron · Jan 4, 2018

moinmoin said:
AMD chips are affected by the way OoO and speculative execution work. But by all accounts so far AMD chips are not affected by unprivileged processes gaining access to privileged data (which is what variant 2 and 3 are about). This points to Intel having neglected privilege separation in the affected areas while in AMD chips this separation appears to be respected and work as intended, thus making both Meltdown and Spectre variant 2 attacks as is not work on AMD chips.

Nope AMD chips aren't affected as far as the research shows. They tried it on AMD chips.

It's just intel chips which have this bug according to all the information we have now.

They simply can not rule out AMD chips being affected because they don't have a 100% robust proof. This isn't surprising in any scientific method because proving a negative is difficult or impossible. But again: all the information we have is that Meltdown does not affect AMD CPUs because it's only intel CPUs with this specific bug.

Massive security hole in CPU's incoming?Official Meltdown/Spectre Discussion Thread

Senior member

Lifer

Lifer

Golden Member

Lifer

Lifer

Golden Member

Golden Member

Golden Member

Lifer

Senior member

Senior member

Diamond Member

Golden Member

Lifer

Diamond Member

Lifer

Senior member

Golden Member

Lifer

Lifer

Lifer

Diamond Member

Golden Member

Golden Member