How do operating systems access different cores?

Cogman

Lifer
Sep 19, 2000
10,284
138
106
So, I've become interested in ASM programming recently, and had this question that has been nagging at me. How does the operating system tell the processor which core to run code on? I know it has to be done at the OS level, and it has to be some sort of specific assembly instruction, but I can't find anywhere anything more then an explanation of how threading works (and that isn't how threading works on different cores either).

My thoughts are that each core has its own interrupt timer with some sort of signaling method to tell the OS which core it is that sent the interrupt. The OS receives the interrupt like normal and returns, like normal, only somehow the iret instruction has been tweaked to go to the core that called the interrupt rather then some other core.

I know there has to be some communication going on to tell which core is which. Otherwise you couldn't set processor affinity in the OS.

So is it really just scheduling certain tasks for certain processors?
 

degibson

Golden Member
Mar 21, 2008
1,389
0
0
Well, its complicated and it differs a lot by ISA, and even within an ISA.

I'm more familiar with Sun's SPARC products at this level of detail, so what I say might be Sun- or SPARC-specific in some places. I'll try to keep things as general as possible. x86 experts please jump in and correct me if I overstep the measly capabilities of x86. :)

Quick note: Back in the days of SMPs, processors were on discrete chips. There was no notion of a "core". A lot of inter-core communication works exactly the same today as it did then, as far as the interface is concerned. Hence, I use 'processor' in the discussion below to mean 'core' in most cases. When I mean the whole chip, I'll say 'chip' or 'CPU'.

Boot
Usually, at reset-time, only processor 0 comes alive and runs code. It starts at some pre-determined address, in some pre-determined state. Usually that memory area is populated by a ROM, that tells the processor where to jump to in order to find the boot ROM.

The other processors are in a halt state at this point. As part of the boot process, processor 0 populates the interrupt vector tables. Once they are populated, processor 0 sends inter-processor-interrupts (IPIs) to the other processors to bring them out of halt. They start execution at the relevant interrupt handler. Each one 'becomes' an OS thread, essentially.

Normal Operation
There are two main approaches for handling interrupts, including timer interrupts (usually global) on threaded machines.
1. Give all non-IPI interrupts to processor 0. Have processor 0 reassign work to other cores as necessary with IPIs.
2. Rotate interrupts among all processors for load balancing. Literally point the interrupt unit to a new processor after each interrupt.

Sorry, I'm not sure what OS's and platforms use what approach. But the high-level point is that as soon as the OS gains control of any one core (e.g., from a timer going off or a system call), it can send IPIs to get to gain control of the others.

The exact mechanism for sending IPIs is platform-specific. It is usually a memory-mapped privileged interrupt controller. Basically, an IPI is a way for a core running privileged code to poke another core, running arbitrary code. They're commonly used for TLB shootdown, scheduling, etc.

(This last part might be specific to SPARC/Solaris hypervisor code): 'Core IDs' come in two flavors. There is a physical id, known only to the hardware (essentially), based on where the core lies in the virtual backplane, and there is an ID visible to the OS. As part of the boot process, the booting core establishes the virtual-id-to-physical-id mappings before activating the other cores, usually just by assigning contiguous numbers, 1..N (0 is taken already) to the (potentially non-contiguous) physical IDs. It is by the virtualized IDs that the IPIs are sent.

Lastly, there is another communication mechanism available, but it is seldom used for explicit communication in OS code: shared memory (duh). Once processors can address the same memory, they could use shared memory to communicate instead (though shared memory provides no way to interrupt user code, for instance).
 
Last edited:

Cogman

Lifer
Sep 19, 2000
10,284
138
106
Thanks Debison, that's exactly what I was looking for.

Upon googling "Inter processor Interupts" From what I can tell, it looks like windows uses the first method that you mentioned. For any that are interested it specifically uses the IRQ 29h for the inter processor communication.

Again, thanks for the informative post.
 

Schmide

Diamond Member
Mar 7, 2002
5,688
921
126
Read up on x2APIC in the Intel documentation. There really is no interrupt call, writing to the Interrupt Command Register will dispatch the inter process communication.

I don't know exactly the specifics as degibson laid them out, but I believe only one processor is assigned the regular APIC map for hardware interrupts and the rest must be dispatched through the x2APIC interface virtually.

The way I see booting another core. First a segment containing thread setup is loaded into an entry the Interrupt Descriptor Table. Then an interrupt is sent out through the x2APIC. The processor receiving this interrupt loads the Task State Segment and executes the code. From there it can load whatever register and descriptor tables are needed then drop out of the interrupt and begin execution.
 

dinkumthinkum

Senior member
Jul 3, 2008
203
0
0
Upon power-up, one logical processor (physical CPU, core, etc) is selected as the bootstrap processor (BSP). The BIOS sets up every other logical processor as Application Processors (AP). It then proceeds to execute CLI HLT on every processor but the BSP. The BIOS then populates some tables in memory which contain information about the system: how many CPUs and their IDs, IRQ assignments, etc. Intel defined the Multiprocessing Spec many years ago and it specifies the table layout. ACPI also has its own similar tables. Most systems create both kinds of tables for compatibility reasons. Normal uniprocessor boot proceeds on the BSP: load boot sector at 0x7C000 etc.

It is the job of the OS running on the BSP to parse the tables and set up the system. Now every logical processor has what is called a Local Advanced Programmable Interrupt Controller (LAPIC). It is typically mapped into memory starting at physical 0xFEE00000 on 32-bit machines. By writing values in a certain manner, you can cause the LAPIC to send an interprocessor interrupt to another LAPIC given an ID (there are a few ways to setup IDs, but that is for later).

The BSP must set up bootstrap code somewhere in memory and send a particular sequence of IPIs to wake an AP from CLI HLT (START-INIT-INIT), respect some delays, and poll to see if the processor is doing something (you can have it write a value in a known location to indicate this, for example).

Of course, the AP begins in real mode and needs to be configured with the usual IDT, GDT and paging setup. At this point, you typically cause it to wait until the rest of the system is ready, and then each processor enters the scheduling code.

The LAPIC also comes with its own timer. It is a very nice one indeed: you can typically achieve ~100ns resolution, you can set it up as one-shot or repeat, and you can tell it to interrupt after a given number of ticks. You do have to measure the CPU bus frequency in order to know how fast it ticks, though.

So typically you have each LAPIC responsible for pre-emption on its own logical processor. Then as each processor enters the scheduler it can pick a new task off the waiting queue and switch to it (to give one naive and simple scheduler algorithm as example).

You can also set aside some designated vectors for your own IPI usage for more interesting interprocessor communication scenarios. For example, TLB shootdown must occur after modifying page tables and that involves holding up every CPU that might be affected, making the change, flushing all the TLB entries necessary, and then getting them going again.

The Programmable Interval Timer (PIT) in this case is made available for other use. Keep in mind that any IO IRQ like the PIT IRQ is going to be routed through what is called the IO-APIC. There can be multiple IO-APICs and they are connected to all the local APICs. The job of the IO-APIC is to route an IO IRQ to one or many LAPICs. You need to configure them at boot based on the information in the MP or ACPI tables. You can achieve various IRQ distribution protocols using the IO-APIC, but of course the easiest thing to do is send them all to one processor. The IO-APIC is usually mapped at physical 0xFEC00000. Though this can be determined through Model Specific Registers (MSR), or the MP/ACPI tables.

Most modern machines will support all this hardware. Sometimes you find on uniprocessor machines that certain features, like the LAPIC, are only enabled through BIOS. On older machines (really, old) you may not have any APICs at all. You have to decide if you want to support that.

The Intel manuals are invaluable resources about all of this except the IO-APIC. That is in a separate section. Also see OSDev.org wiki.

P.S. x2APIC is the newer specification for the LAPIC which removes many of the limitations of the original design.
 
Last edited: