I don't quite follow the part in bold. Processing in parallel always results in more work needing to be done, specifically the overhead associated with creating and feeding data to the parallel threads as well as the aggregation of thread results and finalization of the parallelized task's output.
![]()
At best the overhead (signified as Almasi's and Gottlieb's IPC in the equation above) is negligible and the compute time asymptotically approaches that of Amdahl's Law...and only then can you start to speak of the code being so perfectly parallelized that Amdahl's Law asymptotically approaches that of flat-out serial code on a commensurately faster processor.
I think there's some legitimacy to the claim, in that in the same way a lot of tasks are inherently serial (or have a dominating serial component) some tasks have inherent parallelism and are hard to write as a single task. Multithreading (within a program, as opposed to multiprocessing) was used on processors long before there were multiple hardware cores or threads to benefit.
To take a really low level example - say you have a system that continually reads in data from some sensor, processes it (filtering, etc) and sends it out via some port. Let's also say there's no ability to handle this stuff in hardware using something like DMA. To handle all this stuff happening at the same time you either need threads/interrupts to help you switch context behind the program's back or the program has to have state machines that multiplex all this stuff into timeslices. Point is, you're going to absorb some overhead of handling a bunch of stuff at the same time regardless of if you have multiple processors or not.
Having multiple processors may involve more overhead, but it could actually involve less instead. Because the code can be running on all cores simultaneously there could be no switching overhead on individual cores, if you lock each thread to a different core.
There are actually some processors without interrupts where you have to leverage multiple cores to make it happen. They're more on the toy side like Parallax Propeller.. but you could argue that leaving interrupts out of the design makes for a more economical processor. I wouldn't want to program on one of those though.
I'd argue that generally, unless you're on something very slow or where realtime/fast response time is very critical, the performance benefit from multicore isn't that big of a deal. I can't think of any cases where it'd offer a huge advantage like SunnyD was saying, but there could be a useful application I'm ignoring. One place where it is beneficial is if you have assymetric cores, or if your cores are asleep a lot of the time and it takes less power and wakeup latency to have one core soak all the background stuff vs pushing an active core harder. But that's more power oriented than performance.