Judging by the various reviews out there, M1 is not really built to handle 8K video footage or some other types of >4K footage. It works, but the jerkiness in the timeline plaguing other Macs makes an appearance quite often. According to SoC monitoring software, it often maxes out the "GPU" although I'm not entirely sure what that means. Don't ask me what GPU monitor applet it is cuz I don't know, but I do note that Activity Monitor includes a GPU usage monitor as well. In contrast, the CPU cores aren't breaking a sweat.
n00b question:
Would it really be just the GPU, or does that include that image signal processor outside the GPU and CPU? If it truly is GPU, then increasing the number of GPU cores would likely solve the problem. However, if it's the dedicated ISP outside the GPU, then they'd have to augment that somehow in order to deal with this issue.
IOW, should M1X be more CPU cores + more GPU cores with the same existing ISP, or should the ISP also get a boost?
I think we're going to see, over time, some fascinating changes here.
NPU, GPU, ISP, even media encoder, all, from the thousand foot level, perform the same sort of thing --- many many MANY small engines all operating in parallel on (hopefully) independent pieces of data.
How valuable will it remain to keep these separated (and, of course, slightly more optimized for each particular task) rather than creating a single super-throughput engine that has the specialist capabilities of an NPU (small integer multiplication mainly?), of a GPU (texture lookup and things like that), of a media encoder (I assume mainly the ability to compare slightly shifted pixel blocks against each other to find best matches), of an ISP (no idea what those are!) available to each unit; but all based on a common framework of registers, instruction scheduling, synchronization, cache, and so on?
There an obvious win to this (the system can dynamically expand performance to whatever is the current task -- your GPU is 2x as fast if it can recruit all the computation latent in the NPU, ISP, and h.265 encoder!). And it's designing ONE thing rather than many, and writing more common code.
And there's a cost. More generality means more power usage, and spreading usage this way may require slightly more area.
Is it an overall win? My guess is it could actually be, once someone has had time to figure out the optimal set of common primitives. But will it happen soon (soon means, say, before 2025)? Not a clue! I have no idea what the degree of commonality between these blocks is once you start drilling into the details.