Multi threading at the assembly level

Status
Not open for further replies.

chrstrbrts

Senior member
Aug 12, 2014
522
3
81
Hey Guys,

I'm finishing a book right now on assembly for the x86. It focuses mostly on 32-bit systems but does touch upon some 64-bit programming.

Anyway, it's a large book and I only have 70 pages left to go. However, there has been no mention of threading whatsoever. Further, upon reviewing the contents, it doesn't look like threads will be mentioned in the last 70 pages either.

Are threads higher level software abstractions? Do they exist below the OS / compiler / HLL level at all?

Intel processors have something called hyper-threading. However, after a cursory Wikipedia review, it appears that hyper-threading is a way for a processor to execute code from two different processes within the same clock period, not two different threads from the same process.

I'm beginning to think that a processor has no notion of threading or any "instruction bundling" at all and just executes instructions as they come.

Am I correct?

If I am, is it possible to fake a multi-threaded execution?

Example:

Let's say I'm executing a process and at some point the process entails two sub-tasks that at the most granulated level, the level of assembly code, have nothing to do with each other.

Let's say sub-task 1's job is to put the string "Hello!" on the console 10 times, and sub-task 2's job is to add the members of an array.

Can I simulate a multi-threaded experience like this:

(pseudo assembly)

thread 1:
send "Hello!" to console
send "Hello!" to console
check thread 2's counter
jump to thread 2 if condition holds
unconditional jump to thread 1

thread 2:
add vector element i to accumulator
add vector element i+1 to accumulator
check thread 1's counter
jump to thread 1 if condition holds
unconditional jump to thread 2

I've essentially sort of faked a multi-threaded experience, haven't I?

Also, when you use a HLL to write multi-threaded code, when it's compiled, do you get something like what I've created?

Thanks.
 
Last edited:

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,584
4,495
75
That looks like a quick-and-dirty (emphasis on dirty :p) cooperative multitasking system on a single core, and a single real thread.

I'm pretty sure you need OS cooperation to do real threads on multiple cores.

You should really get a book on operating systems.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,624
2,399
136
Also, when you use a HLL to write multi-threaded code, when it's compiled, do you get something like what I've created?

Sort of, but not quite. The piece you're missing is everything that happens in kernel side. The illusion that the operating system & the CPU provides to the running process is that it gets to provide a continuous stream of instructions and control what the CPU runs. Instead, every now and then a timer interrupt fires (typically about once every millisecond on modern systems), and transfers control to the kernel code. This kernel code (among other things) runs the scheduler, which determines which process/thread should be running, and if it's different than the one currently loaded, it will save current state, replace the memory mappings and register state with the ones of a different process, and then return control to it.

The important thing to remember is that this is not done by the compiler and the compiled code does not need to co-operate with the OS to achieve this. In your example, the compiled code for both threads would just contain the tasks relevant to their own work. Then, the OS could interrupt either at any point, and switch control over to the other one.

The reason this isn't in your assembly book is that in universities, it is typically covered in the OS course, which comes after learning assembly.
 

chrstrbrts

Senior member
Aug 12, 2014
522
3
81
That looks like a quick-and-dirty (emphasis on dirty :p) cooperative multitasking system on a single core, and a single real thread.

I'm pretty sure you need OS cooperation to do real threads on multiple cores.

You should really get a book on operating systems.

Yes, but according to what you've linked to, cooperative multitasking involves switching between processes. I want to switch between threads within the same process.

Sort of, but not quite. The piece you're missing is everything that happens in kernel side. The illusion that the operating system & the CPU provides to the running process is that it gets to provide a continuous stream of instructions and control what the CPU runs. Instead, every now and then a timer interrupt fires (typically about once every millisecond on modern systems), and transfers control to the kernel code. This kernel code (among other things) runs the scheduler, which determines which process/thread should be running, and if it's different than the one currently loaded, it will save current state, replace the memory mappings and register state with the ones of a different process, and then return control to it.

The important thing to remember is that this is not done by the compiler and the compiled code does not need to co-operate with the OS to achieve this. In your example, the compiled code for both threads would just contain the tasks relevant to their own work. Then, the OS could interrupt either at any point, and switch control over to the other one.

The reason this isn't in your assembly book is that in universities, it is typically covered in the OS course, which comes after learning assembly.

OK. I'm familiar with clock interrupts initiating the switching of processes. I understand how the OS discerns one process from another e.g. process control blocks, process IDs, etc.

But, how does the OS discern between threads within the same process?

If I write multi-threaded code for one process using the concurrency classes in Java for example or any other HLL library, after compiling what does the resulting assembly / machine code look like?

The assembler has to add "handling" information for the OS so it can tell where one thread ends and the others begin.

Yes?
 

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,584
4,495
75
I want to switch between threads within the same process.
Take a look at my first link, to Wikipedia.

Wikipedia said:
"clone" is a system call in the Linux kernel that creates a child process that may share parts of its execution context with the parent. It is often used to implement threads (though programmers will typically use a higher-level interface such as pthreads, implemented on top of clone). It was inspired by Plan 9's rfork, but without the "separate stacks" feature, which according to Linus Torvalds causes too much overhead.

The point of both links is that threads are handled in the OS kernel.

The assembler has to add "handling" information for the OS so it can tell where one thread ends and the others begin.

Yes?
Generally, I think it works a lot like fork(). You wind up with two threads running the same code after a call, with nothing to distinguish them except an ID number generated by the kernel. Each thread reads that and does its thing.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,624
2,399
136
But, how does the OS discern between threads within the same process?

Under many OSes, for example old Linux versions, threads were implemented simply as processes that shared all memory maps with each other. The handling preemption and register state was exactly the same as with separate processes. These days, Linux has a separate concept of thread, but the interface provided is still similar.

The assembler has to add "handling" information for the OS so it can tell where one thread ends and the others begin.

You spawn new threads by calling a function that runs a syscall. There are other functions that allow you to manage threads. However, choosing which thread to run at which time is done by the scheduler, and is done through pre-emption, not through manipulating the code that gets emitted.
 

Merad

Platinum Member
May 31, 2010
2,586
19
81
But, how does the OS discern between threads within the same process?

IIRC Linux still has the notion of user mode threads and kernel mode threads. User mode threads avoid the overhead of calls into the kernel, but the massive downside is that they never actually run concurrently, because setting that up isn't possible in user land. The only actual concurrent threads are kernel threads, which AFAIK are essentially just processes. The kernel just tinkers with their various identifiers a bit so that they are all grouped into the same process and have access to the same memory.

You may be interested in a project I did a few years ago that is a very bare bones microkernel for a microcontroller. It does cooperative multithreading with up to eight threads using a round robin scheduler. If you're interested in tinkering with this stuff, I highly recommend microcontrollers. They're cheap and accessible and you can dive straight into things like building little kernels without the mountains of legacy nastiness in x86. I was actually planning to expand that project into a full preemptive multitasking setup, but never got around to it.
 

exdeath

Lifer
Jan 29, 2004
13,679
10
81
Threading is an OS paradigm that has nothing to do with assembly programming.

Threads within a single process will use the same page table directory and process context as it's parent process, that's about the only difference.

Hyperthreading is something else and has nothing to do with threading. It's a scheduler optimization. It's allowing the scheduler hardware within the CPU to leverage and interleave more than one instruction stream to fill in gaps to better the efficiency of instruction reordering and execution unit utilization. Naturally this requires duplicate contexts, to include registers, etc, and since operating system is responsible for providing the context of the two instruction streams hyperthreading is to use, it's handled and presented as virtual or logical cores.

Hyperthreading exposes itself to the OS by presenting two execution contexts, complete with duplicate register sets, etc. The defining factor is that it does NOT duplicate the actual execution hardware (ALU, address calculation units, floating point unit, etc), just the front end context needed to track and run two simultaneous threads in the scheduler. Hence the "logical" cores. The details elude me at the moment but if I recall this is something exposed by the BIOS ACPI MPS tables the same way as true SMP with multiple real cores and sockets.

It works best when you have 2 threads that are monopolizing certain functions exclusively (eg: thread 1 all floating point, thread 2 all integer, thread 3 all memory addressing and load/store). You'll see very little benefit trying to hyperthread a second instruction stream if the first one is extremely optimized and overlapping multiple types of instructions and using all available execution resources at the same time.

While you can manipulate multiple cores through special APIC interrupts to tell the other processors where to execute, this has nothing to do with the topics of OS threading , scheduling, and multitasking.

Modern OS's use pre-emptive multi tasking where a timer interrupt is reset to whatever the thread quantum is (on the order of 20-100 milliseconds) each and every time before doing a fake iret to ring 3 in the user process (yay x86). When the timer expires, it triggers a hardware interrupt which follows the typical x86 task state segment mechanism to transition back to ring 0 kernel space where control returns to the scheduler to change contexts to the next thread. Unless of course the thread invokes a syscall in which case the timer is cancelled and the remaining thread quantum is voluntarily surrendered.

Ever notice that something running in a cmd prompt (say, del *.* in a folder with millions of tiny files) runs faster if you minimize the window? When it's not minimized it has to print each line to the screen, and each time it does that it's a sys call to interact with the console which suspects the thread and invokes a context change and that overhead happening that many times can dwarf even file I/O overhead. Of course it's still having to do the same thing calling NtDeleteFile or whatever but it's kind of pointless giving up it's remaining time calling NtDeleteFile, going to the back of the line, finally coming up again in the scheduler, then immediately calling NtWriteFile to STDOUT and interrupting itself again, and so on.

BTW: https://en.wikipedia.org/wiki/Inter-processor_interrupt is how you actually start and stop other CPUs other than your boot thread. The mechanism is query BIOS for number of CPUs and their ids. Then you write the id of the processor you want to target and the interrupt vector you want it to go two (typically another scheduler or some idle loop/monitor thread) and perform a normal register mov to the multiprocessor interrupt controller and the hardware does the rest. The CPU id you send the interrupt to will receive it and be interrupted like any other interrupt and go to the vector you specified.

As far as threads in the same process, just like file handles, allocated memory pages, and everything else, there are per process data structures the OS creates and manages. Switching threads in the same process just invokes a register context change, keeps the same memory map (CR3), and changes FS/GS to point to the in process TIB for that thread. When and how is up to the OS designer. I'm just guessing for Windows/Linux but I would think each thread in a process is round robin whenever the parent process comes up in the scheduler? Or prioritize threads to run within the same process before switching? Whatever the choice, I'm sure care is taken to avoid excessive CR3 changes (page table directory for virtual memory map) and TLB flushes. Or maybe not? I know x86 and understand operating systems but I didn't actually write Windows or Linux so I'm trying to generalize here... :p

Many of the questions you ask are user choice, user being the operating system designer. Other than the cast in stone hardware mechanisms, most of these questions are up to you as far as how you want to prioritize or schedule, and the best answer is going to be "well... this is how they do it in Windows or Linux because x..."
 
Last edited:

exdeath

Lifer
Jan 29, 2004
13,679
10
81
Just wanted to add that even within the same process, separate threads have their own stacks, that is a pretty significant defining definition of a thread. In addition to it's own register context saved and reloaded when a thread is stopped and started, it needs it's own stack to keep track of it's own local variables and function calls in order to be able to run independently. I thought this should be implied with the context change that includes EBP/ESP/RBP/RSP but thought it required explicit highlighting. In fact the first 3 DWORDs of a Windows Thead Information Block pointed to by FS are stack related: current exception handling frame, stack base, and stack limit.

While not required for single core multi tasking if managed carefully, you'd run into problems the moment two threads ran simultaneously on real SMP hardware and modified the same call stack without the other being aware of it. It's very easy to believe something works a certain way when it's on a single processor but things break down very quickly when there is actual simultaneous execution and shared memory. Even if a single core was assumed, you'd have to leave the stack the way you found it before stopping one thread and returning to the other. Otherwise you would have the wrong call frame and wrong return address and even wrong frame sizes when resuming another thread.

Threading, multitasking, SMP, etc and the various problems and solutions for race conditions, atomic memory access, dependency synchronization, etc is a huge topic all of it's own.
 
Last edited:

Ken g6

Programming Moderator, Elite Member
Moderator
Dec 11, 1999
16,584
4,495
75
This thread seems to be done - locking it before it goes further off-topic -- Programming Moderator Ken g6
 
Status
Not open for further replies.