Well, its complicated and it differs a lot by ISA, and even within an ISA.
I'm more familiar with Sun's SPARC products at this level of detail, so what I say might be Sun- or SPARC-specific in some places. I'll try to keep things as general as possible. x86 experts please jump in and correct me if I overstep the measly capabilities of x86. 🙂
Quick note: Back in the days of SMPs, processors were on discrete chips. There was no notion of a "core". A lot of inter-core communication works exactly the same today as it did then, as far as the interface is concerned. Hence, I use 'processor' in the discussion below to mean 'core' in most cases. When I mean the whole chip, I'll say 'chip' or 'CPU'.
Boot
Usually, at reset-time, only processor 0 comes alive and runs code. It starts at some pre-determined address, in some pre-determined state. Usually that memory area is populated by a ROM, that tells the processor where to jump to in order to find the boot ROM.
The other processors are in a halt state at this point. As part of the boot process, processor 0 populates the interrupt vector tables. Once they are populated, processor 0 sends inter-processor-interrupts (IPIs) to the other processors to bring them out of halt. They start execution at the relevant interrupt handler. Each one 'becomes' an OS thread, essentially.
Normal Operation
There are two main approaches for handling interrupts, including timer interrupts (usually global) on threaded machines.
1. Give all non-IPI interrupts to processor 0. Have processor 0 reassign work to other cores as necessary with IPIs.
2. Rotate interrupts among all processors for load balancing. Literally point the interrupt unit to a new processor after each interrupt.
Sorry, I'm not sure what OS's and platforms use what approach. But the high-level point is that as soon as the OS gains control of any one core (e.g., from a timer going off or a system call), it can send IPIs to get to gain control of the others.
The exact mechanism for sending IPIs is platform-specific. It is usually a memory-mapped privileged interrupt controller. Basically, an IPI is a way for a core running privileged code to poke another core, running arbitrary code. They're commonly used for TLB shootdown, scheduling, etc.
(This last part might be specific to SPARC/Solaris hypervisor code): 'Core IDs' come in two flavors. There is a physical id, known only to the hardware (essentially), based on where the core lies in the virtual backplane, and there is an ID visible to the OS. As part of the boot process, the booting core establishes the virtual-id-to-physical-id mappings before activating the other cores, usually just by assigning contiguous numbers, 1..N (0 is taken already) to the (potentially non-contiguous) physical IDs. It is by the virtualized IDs that the IPIs are sent.
Lastly, there is another communication mechanism available, but it is seldom used for explicit communication in OS code: shared memory (duh). Once processors can address the same memory, they could use shared memory to communicate instead (though shared memory provides no way to interrupt user code, for instance).