Splitting One Process Over Many Cores

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
After thinking it over, this idea could hurt performance of multi-threaded applications, but might be worth it anyways, assuming it's architecturally possible, because I don't know if it is.

I don't know too much about CPU architecture, but enough for this idea to come to me, I'd like your opinions on it.

What if we had a thread of instructions go through a complex branch predictor which focused on out-of-order execution as far into the future as is possible (I have no idea how many instructions that would be) then had it deal out instructions for the same thread, to separate cores. Then have the data converge at a more powerful core for error checking and corrections. This would need L1 cache shared between the first cores, then L2 shared by all of them. Couldn't this effectively spread something like SuperPi over many cores?

Eh, my idea is very crude and underdeveloped, but I think I have something here.
 

Nathelion

Senior member
Jan 30, 2006
697
1
0
So basically you're talking about having extra processors execute various probable branches of continued execution, like, before they have actually happened?
As a first thought, I'd think there would be far too many permutations for any kind of effective "branch predictor" to work, but I'm not an expert either.
 

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
Well, we have branch predictors and out-of-order execution now, this is just an enhancement of that.
 

Nathelion

Senior member
Jan 30, 2006
697
1
0
Maybe it could work for something like boolean operations, where the set of outcomes is very limited. But if you, say, add two 32-bit words there are 2^32 possible outcomes that you'd have to predict?
 

firewolfsm

Golden Member
Oct 16, 2005
1,848
29
91
Well...as I said, we have predictors that do that now, they look at past operations and narrow it down.
 

jones377

Senior member
May 2, 2004
462
64
91
It's called Dynamic Multithreading (DMT) and has been a concept for quite some time. But it has never really left academia or R&D yet and may never will. The problem is that executing both branches is very energy consuming for completed operations in a time all is done to minimize it. Then consider that every 5-6 instructions are branches and you suddenly have much more than just 2 branches to deal with in a normal instruction chain.
 

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
This was done somewhat already in beos
It was capable of doing amazing things with multiple threads and processors, way ahead of its time.

It also had premptive multitasking, something that programmers are really just now starting to take a hard look at since multicore cpus are becoming common.

Look up beos on google. It was really great at multitasking.

 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
Read up on Multiscalar. It's from 1994, so there have probably been advances since then. Google speculative multhreading (the term used in academia for taking a single-threaded program and parallelizing it with hardware) - the first result is very good. The second result (click "view as HTML") also looks likely to be good.

So basically you're talking about having extra processors execute various probable branches of continued execution, like, before they have actually happened?
As a first thought, I'd think there would be far too many permutations for any kind of effective "branch predictor" to work, but I'm not an expert either.

On average you have a branch every 5 instructions. Let's say your processor has 60ish instructions in flight, so there are about 12 branches. That's 2^12 (4096) possible outcomes... even with 16 processors, you haven't made much of a dent (and yet you've increased power consumption by 16X!). For what it's worth, modern branch predictors get over 90% accuracy for most tasks, and perfect branch prediction wouldn't buy back that much performance (IIRC, it's generally going to win you <=20%).

Originally posted by: Modelworks
This was done somewhat already in beos
BeOS did not do this, or anything like it. BeOS was heavily threaded, but all the threading had to be done by the programmer (like with every other platform; BeOS just forced you to use threads more).
It also had premptive multitasking, something that programmers are really just now starting to take a hard look at since multicore cpus are becoming common.
Windows 95 had preemptive multitasking (~1995). Linux always had preemptive multitasking (~1991). The only common OS that didn't was Mac OS (until OS X).
 

Modelworks

Lifer
Feb 22, 2007
16,240
7
76
The reason I said that programmers are just now starting to look at preemptive multitasking is because that with a single core cpu it adds very little gain to a system.

Currently windows still runs like a single core machine even with multiple cores available.
How many times has an application frozen the system ?
How may times has an application doing a cpu intensive task been able to make the system unresponsive ?

A properly implemented preemptive system would not do this.
Windows has preemptive multitasking, but it implements it very poorly.
Mulitple cores changes everything on how using a system like this is implemented.

 

CTho9305

Elite Member
Jul 26, 2000
9,214
1
81
The reason I said that programmers are just now starting to look at preemptive multitasking is because that with a single core cpu it adds very little gain to a system.
That's not true, as you yourself point out below - without preemptive multitasking, even on a single core, one program can easily hang the system. Preemptive multitasking makes a system more robust, and can also make it feel more responsive (e.g. having the OS preempt the current process when the user interacts with a different one - supposedly the responsiveness of desktop linux improved significantly when the kernel was made preemptible too).

Currently windows still runs like a single core machine even with multiple cores available.
I haven't noticed that in a very long time (pre-XP). With the NT-based OSes, unless you have disk thrashing or a higher-priority process eating up CPU time, one process is unlikely to do bad things.

Now, a program that allocates a lot of memory and causes disk thrashing can still bring a system to its knees, but that's because I/O accesses weren't prioritized (Vista is supposed to take care of that, though). The CPU is sitting pretty much idle when that's happening - there's not much to gain from allocating CPU time differently when the disk is thrashing.
 

dmens

Platinum Member
Mar 18, 2005
2,275
965
136
Using multicore to go down speculative execution paths is very wasteful, and given current industry trends there probably never will be any implementation in the consumer market. Branch predictors keep getting better, afaik the last P4 got it wrong once every 500 to 5000 cycles depending on code. C2D is better, and its restart penalty is significantly lower than P4. IMO, the ROI on multiple recursive speculative machines is very low.

On the other hand, helper threads have been around for some time and those can help significantly. That might be a more efficient way to utilize multiple cores.