degibson
Golden Member
- Mar 21, 2008
- 1,389
- 0
- 0
Originally posted by: Modelworks
The answer I got a lot of the time was that the program was designed with single cores in mind and it is too hard to go back and break up the task to target multiple cores. I think the holy grail will be a processors that can divide single task among multiple cores without the aid of the programmer. Until then I am really starting to look more at the GPU and things like OPenCL, which just released its specs. http://www.khronos.org/registr...pecs/opencl-1.0.29.pdf
For those that are inclined, look into a research proposal from the 90's called 'Multiscalar'. Just google it -- it will pop up.
Originally posted by: Ken g6
There's a lot more history to CPU design than just adding instructions or the GHz race.
The funny thing about all this history is that it led back to thread-level parallelism. It turns out that no matter what the advances are in computer architecture, the advances in VLSI always trumped them -- hence the GHz race.
Unfortunately, we then hit the power wall. We just can't clock transistors that fast and provide them with power at the same time. Major bummer. Stupid CMOS...
Believe me, I'd love to invent an architecture that trumps all previous architectures at leveraging parallelism in serial code. It turns out thats a really hard thing to do.
Originally posted by: Markbnj
I bring this up because the UI and the way it interacts with the application and user offers a whole set of its own opportunities and challenges for parallelism.
This is actually pretty awesome. Here's an idea: what can we do to leverage the exiting expertise in UI optimization and turn that into performance optimization? Its actually the same insight, but for a different reason: instead of optimizing the user experience through less latent I/O handling, lets use that same expertise to optimize computation. Key questions:
*Will that expertise survive a few orders of magnitude of latency? For a delay to be noticeable to a human, it has to take ~100 ms. For thread-level parallelism to be useful, we're talking 1-10 ms (or more) worth of computation.
*Will the same intuition apply? If so, how do we express it?
Orthogonally: Would the entire thing be fixed if we changed the event handling model to allow simultaneous handling of multiple events?
