I agree completely. Bring on the cores, and a distributed app / OS model! Linus is IMHO too steeped in existing software architecture - eg. monolithic apps and kernels. Once you break them up into little minute bits of code, that all call each other, you open up more spaces for effective multi-processing.
Linux typically scales out better than any other OS but FreeBSD (unless that's changed very recently), and they have done a lot of work in the past, and will continue into the future, for better performance in parallel and concurrent programs. In general, the Unix way of doing work is to call other processes, and outside of the history-repeating Systemd, it's alive and well.
It's also not new. Doing it for higher throughput with SMP is new, but doing it for saner software development, and to make use of the OS for concurrency, is old. What is new is remaking software with the mainframe mindset, but with modern commodity software and hardware, and that everything is more efficient by using a single service/runtime. Windows can't help it, to a degree, but userland on Linux is, by and large, repeating history for being ignorant of it.
Monolithic v. micro really has nothing to do with your end result, so long as it works (though a microkernel OS has no place in a general purpose computer, due to IPC issues that have yet to be resolved in a suitable fashion). Different tasks have different scheduling needs, and may have different programs and libraries to make use of. Sometimes, a pipe is as easy and good as anything else (if it can work, it's usually the easiest way). Sometimes, it can't be used. Sometimes, it's too slow. And so on. A single C/C++ program managing its own memory and scheduling will generally be the hardest way to do any particular job, but sometimes it might be the best or only way (depending on libraries available, it could actually be the easy, however; such as building a GUI-based program with Qt). And sometimes, it either can't be done, or will take enough time/money to not be worth doing.
Game engines have tens of millions, some more, to go into development, in the knowledge or hope that it will be licensed to make up that cost, so anything that may provide tangible performance benefits will be worth it. Plenty of software is built on a much smaller scale, or with little benefit to massive rewriting.
If you're building new software, and there are places to make it scale out, even on a single PC, it would be stupid not to at least build in provisions for doing so, even if you leave the work of doing it for a later date. Not doing so will put you in a place like Firefox: able to use multiple cores somewhat, because of hacking some concurrency on, but when being slow, it's always that main one blocking the rest. But, even then, your program might not need it enough to be worth doing the work to actually make it scale out.