Why is it so difficult to make Handbrake better at using multicores?

Hulk · Dec 3, 2020

First I want to say that I'm not saying it isn't difficult. I know there is a reason it's just that I don't know what it is. My programming experience is limited. Some assembly back in high school on my Atari 2600 and then Fortran (!!) back in college when I was studying mechanical engineering. I know that currently Handbrake uses multicores quite effectively up to about 6 cores and then less so after that.

Anyway, to my question. I'm wondering why each core in a multicore system can't be assigned to work on a single GOP in Handbrake? In a very simplified view I would think the following could occur.

Using a not very fine grained approached 1 core could be dedicated to scanning the source file to determine the frame range for each GOP. Once the GOP was determined it would then be assigned to core #2, next GOP to core #3, and so on. Depending on how compute intensive this GOP scanning operation is then Core #1 could also stitch the rendered GOP's together as the other cores are transcoding the video. It seems as though encoding 1 GOP is a completely independent operation requiring no data except for the original video from that specific GOP.

If more cores are needed for the scanning and stitching then they would be allocated in such a way as to keep all remaining cores transcoding all of the time.

Obviously this is not possible because I'm not understanding the whole story here. Could someone explain how this really works and why my idea cannot?

Fardringle · Dec 4, 2020

I'm not so sure it's all that hard for them to do it. More likely they just aren't willing to waste resources on making it more efficient with more than 6 cores because they are (apparently) actively working on a new version that will do all of the encoding on the GPU, potentially making it a LOT faster than it would be even if it made use of dozens of CPU cores...

Hulk · Dec 4, 2020

Interesting. Where did you read this? I'd like to check it out.

Cogman · Dec 4, 2020

The simple answer is that encoding is a complex process not well suited for multi-threading. Oftentimes, by parallelizing it you sacrifice encoding quality for speed.

Take the VP9 approach as an example. In order to make things work in parallel they split each frame up into "tiles" and encode those tiles in parallel. That works great for high resolution low motion scenes. However, as soon as your resolution starts dipping or you start dealing with objects moving from one tile to the next frequently, you end up with some subpar encodes.

There are portions of the process you could feed off into different threads. However, another big issue is identifying enough work to make splitting tasks off into their own threads worth it. A major issue with multi-threading is that coordination isn't cheap. It's really easy to write software that burns all your cores while going slower if it were just a single core (context switching is a BEAST).

The best approach, if you have enough memory for it, running multiple instances of the encoders. That is, if you have 10 videos, start up 10 encoder sessions in single threaded mode. That will result in the highest quality and best utilization of your CPU.

Search

Why is it so difficult to make Handbrake better at using multicores?

Hulk

Diamond Member

Fardringle

Diamond Member

Hulk

Diamond Member

Cogman

Lifer

TRENDING THREADS