Has he not heard about job-based queue systems, which may prove to be more scalable than traditional threaded code?
Strange that something decades old is not traditional, today

. IoW, I'm sure he has, and that's nothing to do with it. Amdahl's Law may not strictly apply to splitting a multitude of partially-dependent tasks into subtasks that may run concurrently on parallel processors (in that one can't create a neat little, "X% can be paralleliized"), but the net result is similar: extra work, diminishing returns, then hard limits. Putting parts of ready tasks in queues is nothing new, and doesn't magically solve any problems. It sensibly allows for stable well-performing solutions to already solved problems, at the cost of needing to build the code base over from scratch.
You can't magically make software that isn't embarrasingly parallel scale out to say, 50 CPU cores, and most programs will not have the dev resources to make it even to 4-8, like games do. OTOH, as long as gaming stays a big industry, game engines are likely to scale out ever better, and will get close to their limits, within genres (big multiplayer FPS engines might make it to 12+ cores, while sim games might finally make real use of 3+, in a few generations

).
In all reality, this is not a problem. The context is dead ideas for general computing; a dead horse. Parallel-geared coprocessors: yes. Many weaker cores for parallel processing in lieu of our hundreds of millions of transistors big coes: no (at least not unless an affordable silicon replacement is made, in which case all bets are off). It's also not a problem in that up to several cores, we can take advantage of process and thread-level concurrency that does not require each process to itself scale out, but merely not block another one at that time.
Depending on the task, parallelism can be pointless.
However a properly designed OS can easily keep a ton of small lightweight cores busy just by handling background tasking while freeing up the heavyweight cores to do the more important heavyweight user demanded operations to keep the system responsive. This, imho, would be the holy grail of "parallelism" at the OS level for a multi-many core CPU at this point. Less places for the OS to get bottlenecked or backlogged equals a much more responsive system. It's not exactly traditional parallelism though.
But exactly which tasks are those? Identifying those is the problem, and if you give devs the choice, they are going to want theirs to be on the big cores every single time. If the choice is wrong, then the user basically has a slower computer than they should. Meanwhile, giving identified background processes less time on the big CPU cores offers the same net result without the added complications. We didn't go from mainframes to single-CPU to SMP for nothing...