Discussion Intel Nova Lake in H2-2026: Discussion Threads

Page 51 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

DrMrLordX

Lifer
Apr 27, 2000
23,059
13,162
136
CPU encoding. Loves lots of cores.
Isn't handbrake known to crap out at around 32t before you get diminishing returns?


fps-vs-cores.png


edit: looking at some reviews of 9900X and 9950X, it looks like x265 and (even moreso) x264 are not scaling that well past 12c/24t. Basically getting at best ~20% scaling from 33% extra cores, and it only gets worse from there.
 
Last edited:

poke01

Diamond Member
Mar 8, 2022
4,556
5,853
106
Isn't handbrake known to crap out at around 32t before you get diminishing returns?


fps-vs-cores.png


edit: looking at some reviews of 9900X and 9950X, it looks like x265 and (even moreso) x264 are not scaling that well past 12c/24t. Basically getting at best ~20% scaling from 33% extra cores, and it only gets worse from there.
So this is where fast cores are better than more cores?
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,375
651
126
@DrMrLordX:

The first test was from 2018. So the MT implementation in Handbrake and the underlying codec libraries may have improved since then.

Then regarding 9950X vs 9900X (and for the first test too), could the power constraint also not explain why performance does not scale linearly with core count in this case? I’m thinking that the 9950X cores will run at lower frequency than on 9900X, due to same TDP constraint and the former CPU having more cores so less power/core.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
Isn't handbrake known to crap out at around 32t before you get diminishing returns?


fps-vs-cores.png


edit: looking at some reviews of 9900X and 9950X, it looks like x265 and (even moreso) x264 are not scaling that well past 12c/24t. Basically getting at best ~20% scaling from 33% extra cores, and it only gets worse from there.
That’s probably 4K conversion as well. If you do 1080p or lower res, core scaling gets even worse.
 

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
@DrMrLordX:

The first test was from 2018. So the MT implementation in Handbrake and the underlying codec libraries may have improved since then.

Then regarding 9950X vs 9900X (and for the first test too), could the power constraint also not explain why performance does not scale linearly with core count in this case? I’m thinking that the 9950X cores will run at lower frequency than on 9900X, due to same TDP constraint and the former CPU having more cores so less power/core.

The 9900x has a much lower power limit than the 9950x and is just as power constrained. Video encoding just quickly hits diminishing returns once you get past 24t or so and that’s with high res (4K), modern formats. The vast majority of people will see even less benefit because they aren’t encoding that high res and are probably still using x264.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,375
651
126
The 9900x has a much lower power limit than the 9950x and is just as power constrained. Video encoding just quickly hits diminishing returns once you get past 24t or so and that’s with high res (4K), modern formats. The vast majority of people will see even less benefit because they aren’t encoding that high res and are probably still using x264.
I wonder what would happen to the total processing time needed if you split the movie into two equal sized parts. Then run two Handbrake instances in parallel each transcoding one of the parts. And finally merge the two parts in the end. :)

Or perhaps even simpler: Run two (or more!) Handbrake instances in parallel, each transcoding a separate movie. Then on a 52C NVL-S there’ll be 26C per Handbrake instance. So the CPU core count scaling issue should have much less impact.

This assuming that the issues are Amdahl’s law related. If it’s some other bottleneck that gets hit when increasing core count when using Handbrake, then the solutions above may not work.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
So this is where fast cores are better than more cores?
Once you get past the 24ish thread count I mentioned, then yes, at least generally speaking. Just look at where the 5950x sits in these results. Twice the cores/threads of the 7700x but still loses to it.

1764507441736.png

If you bump up to 4K, things are different, but it still loses to a lower core count Zen 4 CPU. You can get different results with different settings and source videos, but the diminishing returns with large core counts holds true in pretty much every case.

1764507606839.png
 
Last edited:
  • Like
Reactions: Tlh97 and booklib28

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
I wonder what would happen to the total processing time needed if you split the movie into two equal sized parts. Then run two Handbrake instances in parallel each transcoding one of the parts. And finally merge the two parts in the end. :)

Or perhaps even simpler: Run two (or more!) Handbrake instances in parallel, each transcoding a separate movie. Then on a 52C NVL-S there’ll be 26C per Handbrake instance. So the CPU core count scaling issue should have much less impact.
Total time would go up because you’d still need to export the merged video at the end, so you’d just be wasting cycles.

Edit: You can do multiple encodes at a time, but the vast majority of people aren’t producing multiple videos simultaneously and those that are typically use GPU encoding to massively speed up encoding time. The few real professionals who may do multiple videos and want to only use CPU encoding are pretty much all production studios who have server or Threadripper work stations to get the job done, they don’t touch consumer hardware.

I’m sure there exists some who do multiple encodes on consumer hardware, I’ve done it occasionally myself, but you’re talking a tiny percentage of a small percentage of the market and most likely, like myself, it’s not a determining factor in their purchase decision.
 
Last edited:

Fjodor2001

Diamond Member
Feb 6, 2010
4,375
651
126
Total time would go up because you’d still need to export the merged video at the end, so you’d just be wasting cycles.
Not sure what you mean. Merging two MKV parts does not require transcoding. So it’s a much quicker operation.
Edit: You can do multiple encodes at a time, but the vast majority of people aren’t producing multiple videos simultaneously and those that are typically use GPU encoding to massively speed up encoding time. The few real professionals who may do multiple videos and want to only use CPU encoding are pretty much all production studios who have server or Threadripper work stations to get the job done, they don’t touch consumer hardware.
I would say it’s quite a normal scenario, but it depends on the user and use case. E.g. you could rip or download some 4K movies and then want to transcode them to reduce size. So you might as well transcode several of those movies in parallel.

Same if you e.g. have video sequences from your vacation that you want to transcode to reduce size.

Using GPU is faster, but the output video quality will be worse compared to if transcoding using the CPU.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
Not sure what you mean. Merging two MKV parts does not require transcoding. So it’s a much quicker operation.

I haven’t done that flow before but I guess it could work. It might mess up some features (sub titles, chapters) but maybe not.

I would say it’s quite a normal scenario, but it depends on the user and use case. E.g. you could rip or download some 4K movies and then want to transcode them to reduce size. So you might as well transcode several of those movies in parallel.

This is not “normal” at all. The amount of consumers who do any kind of encoding is small to begin with. Those who want to do large batch processing without acceleration (i.e., only on the CPU) is nearly non-existent. This is coming from one of the very few people who have done it.
 
Last edited:
  • Like
Reactions: Nothingness

Fjodor2001

Diamond Member
Feb 6, 2010
4,375
651
126
This is not “normal” at all. The amount of consumers who do any kind of encoding is small to begin with. Those who want to do large batch processing without acceleration (i.e., only on the CPU) is nearly non-existent. This is coming from one of the very few people who have done it.
Well, we’re talking about the subsection of the market that does video transcoding. That subsection may be small. But within that subsection I think transcoding several movies should be quite common. So then it might as well be done for several movies in parallel, especially going forward now if we’re going to see substantial speedups if doing it that way.
 

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
Well, we’re talking about the subsection of the market that does video transcoding. That subsection may be small. But within that subsection I think transcoding several movies should be quite common. So then it might as well be done for several movies in parallel, especially going forward now if we’re going to see substantial speedups if doing it that way.
For the five people in the world who do this, sure.
 

Nothingness

Diamond Member
Jul 3, 2013
3,342
2,432
136
Once you get past the 24ish thread count I mentioned, then yes, at least generally speaking. Just look at where the 5950x sits in these results. Twice the cores/threads of the 7700x but still loses to it.

View attachment 134583

If you bump up to 4K, things are different, but it still loses to a lower core count Zen 4 CPU. You can get different results with different settings and source videos, but the diminishing returns with large core counts holds true in pretty much every case.

View attachment 134584
Could it be that such video encoding is inherently MP limited (I have no knowledge of the underlying algorithms so this might just be plain stupid)?

Anyway this shows expecting nice MT speedups even for a task that looks highly parallel is a fallacy.
 

Fjodor2001

Diamond Member
Feb 6, 2010
4,375
651
126
Could it be that such video encoding is inherently MP limited
No. I mean you should in theory just be able to split the movie into X parts and then transcode each of those parts in parallel completely independently, and join/merge the output of each part.

I don’t know how come the current x265 codec is not capable of that currently though, so it does not scale completely linearly with amount of CPU cores beyond a certain number of cores. Possibly it could be some bad implementation that does not handle threading ideally. If so, future versions of the x265 codec may improve in this regard.

Or there’s some other resource bottleneck, but I’m unsure what that would be. Memory speed, disk speed, …? Does not seem likely in this case, at least if encoding with high video quality (i.e. more CPU crunching needed per amount of video data, so less new data need to be read/written per time unit), and we’re not talking about some huge thread count like 256T or whatever and/or using HDD instead of SSD.
 
Last edited:

Hitman928

Diamond Member
Apr 15, 2012
6,746
12,466
136
If it really would be that few people who do video transcoding, then it”s strange that it’s a quite common benchmark to include in CPU reviews.
I have tried to be very clear about it, but I’ll try again, there is a huge difference in the amount of people doing video transcoding and those doing it strictly on the CPU with parallel encodes. The latter is basically nonexistent. Even with single encode jobs, just because reviewers use it only means it’s an easy to run and repeatable test, it doesn’t mean it’s actually representative of what is done outside of reviews.
 

Geddagod

Golden Member
Dec 28, 2021
1,589
1,655
106
The dies are 55mm2 for Clearwater forest so yields shouldn't be a problem
better yields always help, but yea smaller dies mitigate it
It is possible that it is cheaper to produce than if we count margin stacking
The design cost difference should be mid/low double digits, and then you have to consider Intel foundries wafer cost prices solely from the process POV vs TSMC's (should be higher).
It's possible but I doubt it.
Intel usually base it on SPEC In server like AMD does
You can always check Intel.com/performanceindex
They don't list it for their comparison to SRF
and I feel like Intel also botched the L3 on Clearwater Forest hence the issue(based on Years of Intel Botching L3 in Client and Server) the fabric for sure is bad only 35 GB/s.
Maybe, as David C1 pointed out Skymont got like half of the IPC uplift in DC as they did in client
But currently the implication is that a Zen 5C core has like ~25% the perf/watt advantage as a Darkmont core when taking away the SMT advantage.
And Intel has advantages too coming from more advanced packaging and faster mem support.
 

Doug S

Diamond Member
Feb 8, 2020
3,726
6,585
136
Well, we’re talking about the subsection of the market that does video transcoding. That subsection may be small. But within that subsection I think transcoding several movies should be quite common. So then it might as well be done for several movies in parallel, especially going forward now if we’re going to see substantial speedups if doing it that way.

Sorry but the scenario you've described doesn't require parallel operation. It doesn't even require fast operation. You only transcode movies once and and it isn't like you're waiting on the transcode to complete to immediately watch several movies at once. You don't care whether it takes 5 minutes per movie or 5 hours, you just want it to run in the background and not affect foreground tasks to any noticeable degree.
 

Khato

Golden Member
Jul 15, 2001
1,318
391
136
Speaking as one of those small subsection that far prefers the superior video and audio quality of blu-ray along with the convenience of having copies on a server rather than directly watching the discs... Yup, care more about the quality and power efficiency of the transcode than whether it takes 4 or 24 hours to complete. Arrow lake was a marked improvement over raptor lake in both efficiency and transcode time which was certainly nice.