x86's non-executable stack

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Why don't more OSes take advantage of the segment registers provided by x86? It's not exactly difficult to set them up, and provides a LOT of protection against stack-busting buffer overflow exploits. I don't know the windows memory map, but in linux, where you keep the .text segment in a consistent location, it should be trivial. Is it really THAT useful to execute code on the stack?
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
Buffer overflow exploits almost always use the Heap, not the Stack.

Either way, this is a good question --- why is the heap (or stack) segment not separated from the code segment?
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Originally posted by: glugglug
Buffer overflow exploits almost always use the Heap, not the Stack.
Either way, this is a good question --- why is the heap (or stack) segment not separated from the code segment?

Actually the use both, but the stack is definately more common (not the heap which is much harder to craft an attack for).

Why don't more OSes take advantage of the segment registers provided by x86

I think your asking why more OS's dont' take advantage of the ability to make memory read or read/write but not execute.

The answer is it's impossible to do this properly with the current x86 design. This is why MS added support in SP2 for the newer CPU's which do support this properly. I suspect will see more chips with this capability soon to take advantage of the change.

Bill
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
How can a buffer overflow use the stack?

Everything that gets put on the stack is a value of KNOWN fixed size, generally no more than 4 bytes. Complex structures and variable length items like strings are NOT passed around on the stack, a pointer to them is. This pointer points to a memory location on the heap containing the string or structure....
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: bsobel
Why don't more OSes take advantage of the segment registers provided by x86

I think your asking why more OS's dont' take advantage of the ability to make memory read or read/write but not execute.

The answer is it's impossible to do this properly with the current x86 design. This is why MS added support in SP2 for the newer CPU's which do support this properly. I suspect will see more chips with this capability soon to take advantage of the change.

Bill
The bolded sentence is incorrect. Lookie. Of course that patch alone isn't proof, but if you google a bit you'll find an explanation. Look at the 286 section. more info (note the "executable" bit).

Originally posted by: glugglug
How can a buffer overflow use the stack?

Everything that gets put on the stack is a value of KNOWN fixed size, generally no more than 4 bytes. Complex structures and variable length items like strings are NOT passed around on the stack, a pointer to them is. This pointer points to a memory location on the heap containing the string or structure....

Correct. That's the source of the whole problem. It happens when you allocate a fixed size buffer on the stack, then read in data from another source. For example, the following code is vulnerable to a buffer overflow exploit:

void foo(char *stringFromUser) {
char c[4];
strcpy(&c, stringFromUser);
}

c is allocated on the stack as 4 bytes. If the string from user is more than 4 bytes (including \0), the stack starts getting overwritten. At least using linux conventions, the first 4 bytes after c will be a saved register (ebp), followed by the return address, then stuff like saved registers. If your user enters a long string, and carefully crafts it, the return address can be made to point further up the stack, where the exploit code is located.
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Originally posted by: glugglug
How can a buffer overflow use the stack? Everything that gets put on the stack is a value of KNOWN fixed size, generally no more than 4 bytes. Complex structures and variable length items like strings are NOT passed around on the stack, a pointer to them is. This pointer points to a memory location on the heap containing the string or structure....

Sigh, you have no clue how modern languages work or use the stack. The majority of overflows happen here.

The bolded sentence is incorrect

The keyword is correctly. There are hacks that attempt to do this by playing some tricks on undocumented pentium features. However these features may or may not work the same between releases of that class of cpu. The correct way is to have true native cpu support for this. We are finally starting to see this on consumer equipment. NT had this ability all along (to mark pages read only vs read and execute), however only the MIPS and Alpha cpu's supported it. It's still 'available' on x86 (e.g. the api will tag the page), but the underlying hardware just isn't capable of doing it (truely).

Bill

 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
Originally posted by: bsobel

The keyword is correctly. There are hacks that attempt to do this by playing some tricks on undocumented pentium features. However these features may or may not work the same between releases of that class of cpu. The correct way is to have true native cpu support for this. We are finally starting to see this on consumer equipment. NT had this ability all along (to mark pages read only vs read and execute), however only the MIPS and Alpha cpu's supported it. It's still 'available' on x86 (e.g. the api will tag the page), but the underlying hardware just isn't capable of doing it (truely).

Bill

OpenBSD and Linux both manage non-exec stacks on x86.
http://www.deadly.org/article.php3?sid=20020724131711&mode=flat

non-exec heaps is what they seem to have problems with because of the hardware.
http://www.deadly.org/article.php3?sid=20020826013453&mode=flat

According to the second post on the second link, non-exec stack is done by playing with the segments on x86. It is not per page non-exec, x86 cannot handle that. x86_64, Sparc (sparc4m-sparc4u atleast), and alpha all support this (you say mips too, I'll have to take your word for it). So if we look at segments instead of pages, we can have non-exec stacks on x86.

And yes, I have almost no clue what I'm talking about. ;)

I'm just looking forward to x86_64 so I can have per-page permissions, and NX and all the other fun goodies OpenBSD has these days. My Ultrasparc has issues :p
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Originally posted by: glugglug
Buffer overflow exploits almost always use the Heap, not the Stack.
Either way, this is a good question --- why is the heap (or stack) segment not separated from the code segment?

Glug, found a good primer on buffer overflows for you here.

Bill
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Ok, dug up an article I was thinking of for Exec Shield on Linux. The full article is here. I'll pull some important parts out:

"It is commonly known that x86 pagetables do not support the so-called executable bit in the pagetable entries - PROT_EXEC and PROT_READ are merged into a single 'read or execute' flag. This means that even if an application marks a certain memory area non-executable (by not providing the PROT_EXEC flag upon mapping it) under x86, that area is still executable, if the area is PROT_READ. Furthermore, the x86 ELF ABI marks the process stack executable, which requires that the stack is marked executable even on CPUs that support an executable bit in the pagetables. This problem has been addressed in the past by various kernel patches, such as Solar Designer's excellent "non-exec stack patch". These patches mostly operate by using the x86 segmentation feature to set the code segment 'limit' value to a certain fixed value that points right below the stack frame."

and

"The exec-shield feature works via the kernel transparently tracking executable mappings an application specifies, and maintains a 'maximum executable address' value. This is called the 'exec-limit'. The scheduler uses the exec-limit to update the code segment descriptor upon each context-switch. Since each process (or thread) in the system can have a different exec-limit, the scheduler sets the user code segment dynamically so that always the correct code-segment limit is used.
the kernel caches the user segment descriptor value, so the overhead in the context-switch path is a very cheap, unconditional 6-byte write to the GDT, costing 2-3 cycles at most. Furthermore, the kernel also remaps all PROT_EXEC mappings to the
so-called ASCII-armor area, which on x86 is the addresses 0-16MB. These addresses are special because they cannot be jumped to via ASCII-based overflows. E.g. if a buggy application can be overflown via a long URL:"

So yes, there are ways to hack up and somewhat mimic this behaviour (and Linux, BSD, etc now do this), but the x86 simply doesn't support this today.

Bill
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: bsobel
So yes, there are ways to hack up and somewhat mimic this behaviour (and Linux, BSD, etc now do this), but the x86 simply doesn't support this today.

Bill

As long as you use the stack normally, the mimicry will work fine, so I'm saying it does. I guess we agree that it can be done, you just don't consider the method of implementation acceptable.
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
As long as you use the stack normally, the mimicry will work fine, so I'm saying it does. I guess we agree that it can be done, you just don't consider the method of implementation acceptable.

No, Im just saying the hardware does't natively support it. Everything that tries on x86 is a hack. PaX is best done (IMHO) of the group. But read thru what the actually do here and you'll see the tricks they need to jump thru in order to make this work and why you can't guarentee this will remain usable moving forward. Your original question was why don't people take advantage of the segment registers, the answer remains they do not work the READ and EXEC flags are actually implemented as one (see the previous posts on this)

Bill
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: bsobel
As long as you use the stack normally, the mimicry will work fine, so I'm saying it does. I guess we agree that it can be done, you just don't consider the method of implementation acceptable.

No, Im just saying the hardware does't natively support it. Everything that tries on x86 is a hack. PaX is best done (IMHO) of the group. But read thru what the actually do here and you'll see the tricks they need to jump thru in order to make this work and why you can't guarentee this will remain usable moving forward. Your original question was why don't people take advantage of the segment registers, the answer remains they do not work the READ and EXEC flags are actually implemented as one (see the previous posts on this)

Bill

Link.
To fetch instructions the CPU unconditionally uses the CS register.
If you execute beyond the end of the code segment, you get an exception. (see 9.8.13)
edit: see also here, 6.3.1.2 and 6.3.1.1 and 6.3.3:
With the 80386, control transfers are accomplished by the instructions JMP,
CALL, RET, INT, and IRET, as well as by the exception and interrupt
mechanisms. Exceptions and interrupts are special cases that Chapter 9
covers. This chapter discusses only JMP, CALL, and RET instructions.

The "near" forms of JMP, CALL, and RET transfer within the current code
segment, and therefore are subject only to limit checking. The processor
ensures that the destination of the JMP, CALL, or RET instruction does not
exceed the limit of the current executable segment.
This limit is cached in
the CS register;...
See the bolded statement.

Also see section 6.5.


Segments are NOT the same as page table entries (which you correctly state do not support executable permissions). Am I missing something?
 

n0cmonkey

Elite Member
Jun 10, 2001
42,936
1
0
Originally posted by: bsobel
As long as you use the stack normally, the mimicry will work fine, so I'm saying it does. I guess we agree that it can be done, you just don't consider the method of implementation acceptable.

No, Im just saying the hardware does't natively support it. Everything that tries on x86 is a hack. PaX is best done (IMHO) of the group. But read thru what the actually do here and you'll see the tricks they need to jump thru in order to make this work and why you can't guarentee this will remain usable moving forward. Your original question was why don't people take advantage of the segment registers, the answer remains they do not work the READ and EXEC flags are actually implemented as one (see the previous posts on this)

Bill

One of the problems I have with PaX is that it will break applications. I look forward to trying it out soon though (need a spare hard drive to play with :D).
 

glugglug

Diamond Member
Jun 9, 2002
5,340
1
81
Originally posted by: CTho9305
Originally posted by: bsobel
Why don't more OSes take advantage of the segment registers provided by x86

I think your asking why more OS's dont' take advantage of the ability to make memory read or read/write but not execute.

The answer is it's impossible to do this properly with the current x86 design. This is why MS added support in SP2 for the newer CPU's which do support this properly. I suspect will see more chips with this capability soon to take advantage of the change.

Bill
The bolded sentence is incorrect. Lookie. Of course that patch alone isn't proof, but if you google a bit you'll find an explanation. Look at the 286 section. more info (note the "executable" bit).

Originally posted by: glugglug
How can a buffer overflow use the stack?

Everything that gets put on the stack is a value of KNOWN fixed size, generally no more than 4 bytes. Complex structures and variable length items like strings are NOT passed around on the stack, a pointer to them is. This pointer points to a memory location on the heap containing the string or structure....

Correct. That's the source of the whole problem. It happens when you allocate a fixed size buffer on the stack, then read in data from another source. For example, the following code is vulnerable to a buffer overflow exploit:

void foo(char *stringFromUser) {
char c[4];
strcpy(&c, stringFromUser);
}

c is allocated on the stack as 4 bytes. If the string from user is more than 4 bytes (including \0), the stack starts getting overwritten. At least using linux conventions, the first 4 bytes after c will be a saved register (ebp), followed by the return address, then stuff like saved registers. If your user enters a long string, and carefully crafts it, the return address can be made to point further up the stack, where the exploit code is located.

I see it now. For some reason when I think of the stack what comes to mind is parameter passing between functions. I hadn't thought about it being used for function local variables.
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Segments are NOT the same as page table entries (which you correctly state do not support executable permissions). Am I missing something?

No, I was thinking you meant page table entries. The reason that segment selectors aren't used is that in modern OS's they are basically setup as one large flat address space with page table entries being used instead to provide the needed protection.

If you look at the Windows memory map (you could use Linux too for this example) you'll see the kernel user mode code loaded at the 'top' of the address space with application user mode code loaded 'at the bottom' (not completely accurate as their are some reserved areas to protect against bad pointers and what not). Somewhere mixed in with the application code is the app's heap, stack, etc.

Given the abiliy to dynamically load libraries (at any time) in to the address space, you quickly wind up with the application code being a nice 'mixed' environment of code and data. There just isn't a good way to setup the cs register to limit what area is code. You can try (thats what some of the hacks do) by limiting the register based on the code your going to execute (where it is), but what happens when it makes a call to kernel or another module?

Bill
 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Originally posted by: bsobel
but what happens when it makes a call to kernel or another module?

Bill

Well, when you make a call to the kernel, a context switch happens anyway, so the segment could be updated, but dynamic libraries in linux are located between the heap and stack, so supporting that might be a bit of a mess.
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Well, when you make a call to the kernel, a context switch happens anyway, so the segment could be updated, but dynamic libraries in linux are located between the heap and stack, so supporting that might be a bit of a mess.

I didn't mean kernel as in ring 0, I meant the user mode components of the OS. On Windows (for example), user.dll, gdi.dll, etc. Now, true, eventually many of the functions wind up doing something that calls down to ring 0, but there's alot of glue code that's at ring 3 so the address space is just a mess ;)

If Intel had originally implemented page restrictions properly the point would be mute anyhow, maybe we'll finally get them (not that it's a complete silver bullet, there are still ways of getting around non-exec stacks and heaps, but it will help)

Bill
 

bsobel

Moderator Emeritus<br>Elite Member
Dec 9, 2001
13,346
0
0
Well, when you make a call to the kernel, a context switch happens anyway, so the segment could be updated, but dynamic libraries in linux are located between the heap and stack, so supporting that might be a bit of a mess.

I didn't mean kernel as in ring 0, I meant the user mode components of the OS. On Windows (for example), user.dll, gdi.dll, etc. Now, true, eventually many of the functions wind up doing something that calls down to ring 0, but there's alot of glue code that's at ring 3 so the address space is just a mess ;)

If Intel had originally implemented page restrictions properly the point would be mute anyhow, maybe we'll finally get them (not that it's a complete silver bullet, there are still ways of getting around non-exec stacks and heaps, but it will help)

Bill