I'll write this reply from the PC world perspective. IBM has been doing multi-cores and multi-way supercomputing for years, as is noted above. Scientific things can be broken up nicely because there is so much information to crunch, but when it comes to personal uses, it's hard to make use of many cores to help a single task. We will get there, but probably very slowly.
Writing multithreaded apps is in general, much harder than to write a single-threaded one if you want to take full advantage of all processors. It's easy to write a windows app (say MS Word) that runs a spell-checking thread in the background in a seperate thread, but from the outset, Word doesn't use a lot of CPU while it is waiting for you to type. The key to having multiple CPUs/cores is taking full advantage of them. There are a number of obstacles that must be overcome to thread a program easily.
The first thing you must realize is that a modern CPU is designed to run one program. Now if you have 2 CPUs running one program, they aren't designed to help each other at a low, hardware level so there is a lot of system task overhead like context switching that must be managed by the OS.
Another way of saying things is that having one CPU do one thing is easy, and having many CPUs each do one of many things is also easy. Having one CPU do many things is what we do today in a multitasking windows OS and it works pretty well, mainly because most things don't hog the CPU. Now as we move to multicore, things are turned on their head as we want one thing to be done by many CPUs and there are some basic computer science reasons why this is very hard, if not impossible in some cases.
A practical problem is that there is no universal standard for writing multithreaded code. If you write a C++ app in windows, you make it multithreaded in Windows by calling code from a Windows library to create a thread. Now if you wanted to port that app to Linux, you have to use different libraries for your threads (pthreads). This is not fun work to do. If we do end up with scaled multi-core CPUs (4, 8, 16 cores) it would be extremely useful if the x86 instruction set added hardware level instructions for creating and managing threads in assembly so there would be no need to ratify a software standard for use by programmers. This would require pushing some tasks that windows does (like task scheduling) to the hardware level. Right now, if you write a windows app, it's very hard to find out how many CPUs you have at run-time and create that many threads. That's some nasty low-level system code to conjur up.
When you start programming a multithreaded app, the big problem you have to worry about is serialization of data. If you have 2 threads that need to write to one piece of data in memory, they have to wait for each other, and which one gets to goes first? This is done by blocking one thread and have it wait with a lock. In your example of a spell checker that splits the number of pages in 2, the efficiency of the code is determined by how your data is structured, and how each CPU has access to that data. If all the words were stored in memory, there are a number of different algorithms you could write to split the work up into 2 threads, but they would have to know where to stop and only access their own data. On some systems, you may be using double the ram depending on how memory sharing is implemented. If, on the other hand, the spell checker was reading the words from a hard disk, it would have to do a lot of I/O reads to check each word, and would probably buffer some of it. In that case, if you have 2 threads doing I/O reads from different places, performance will slow down as your single disk is now doing random access instead of sequential access, and then you have to manage multiple buffers which could overlap or become outdated.
Simply put, multithreading makes things complicated. An analogy is imagine your buddy came along to help you in everyday tasks, and you want to get the most things done. Well, for some things, you don't need your buddy (like say, performing neccessary bodily functions). Other things you have to tell your buddy so much information so that he can help you, it's not worth telling him and just doing things yourself. Otherwise, your buddy can be helpful, but having him doing the same thing you are doing often presents many problems. If you think about it, it comes down to how clever you and your buddy are at getting work done and helping each other out in optimal ways.
There's a lot of CS you can get into to answer your question. Look up semaphores and race conditions.
In case you or anyone else is curious, games tend to be higly single-threaded, and very hard to multithread. The reason is that every game is just a big (single) loop that repeats and performs a sequence of steps. And since (most) games are not locked to a particular frame-rate, a game is constantly repeating tasks, never giving the CPU a break. A typical game loop for a FPS might look like this:
while(1)
{
// handle event handlers to get user input from mouse/kbd
// update game world with new user input
// call AI functions to determine new enemy positions (can be CPU intensive)
// Update world with AI positions
// reposition moving objects in the world
// check world for colissions (CPU intensive, 3D space is hard to search)
// resolve collisions, redetermine positions
// draw graphics for this frame (CPU / GPU intensive)
}
Note that each CPU intensive thing depends on the thing that came before it.
Since we are constantly repeating this loop, and there is only one game world that is constantly changing, multithreading is very hard. I'm not saying it can't be done, but you need to be a skilled programmer, well above most average coders of today. The problem is keeping everything in sync and allowing multiple threads to access the same game world without stepping on each other's toes. While thread 1 may be the main loop that has already resolved collisions, thread 2 making AI decisions is using information from the old game state, what if one thread outraces the other so the AI falls behind by minutes? It's tricky, it can be done, but in a complicated way. There are things that can be done in seperate threads, but often the question is "why?" since they require very little cpu to begin with, like networking and I/O. In order to really take full advantage, you need to split up the CPU-intensive tasks of the game loop, and keeping them in sync is tricky to say the least.
By the time we have 16-way cores, we may have some games that are good at using 2-4 of your cores. Hardware develops fast, software develops slowly.
I don't mean to rain on anybody's parade, but I'm hoping I can offer some insight.
-Titan