Question How in the world has AMD got the Ryzen 7600X and 7700X priced same when they are inferior even in P cores only compared to 13600K and 13700K

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Wolverine2349

Member
Oct 9, 2022
157
64
61
I mean the Ryzen 7700X is an 8 core CPU and Ryzen 7600X is a 6 core CPU. And the 7700X is $399 and 7600X is $299.

Intel has the Core i7 13700K priced at $399 and Core i5 13600K priced at $399. And those CPUs have better P cores being 8 and 6 core counterparts with slightly better IPC than Zen 4 and can clock as high or higher with similar power usage. And for those who do not like e-cores (I am one of them, but I love Intel P cores) can disable them and you get better 6 and 8 core CPUs form Intel Raptor Lake than AMD Ryzen. And for those who want e-cores you get then as well for the same price and better P cores of equal core counts.

SO what is AMD thinking and they still have not budged on the prices of the 7600X and 7700X. They are pricing the like their 6 and 8 Zen 4 cores are better than Intel's Raptor Cove cores of equal count even though they are not any better and in fact not as good?? Or is that debatable??

The Ryzen 7900X and 7950X prices make more sense as then you get more than 8 strong cores and AMD has those by the balls who want more than 8 cores and do nit want to go hybrid route. SO yeah 7900X and 7950X prices make sense.

But 7600X and 7700X are almost a ripoff unless you just have not have AMD as they do nothing better than 13600K and 13700K for exact same price and have slightly weaker P cores and no additional e-cores for those that like the e-core options (And for those that do not it is easy peasy to disable and you get the better 6 and 8 core chips for the same price)

Its puzzling to me AMD is behaving as if they are still superior in all ways like they were with Ryzen 5000 from November 2020 to November 2021 when Intel was of no competition on core count nor per core IPC performance which was only for 1 year. I mean AMD is still much smaller and was underdog for years and hard to believe they think they can act they are premium brand in the 6 and 8 core CPU segment when the 7600X and 7700X are worse than Intel counterparts even with the e-cores off.

Your thoughts
 

JustViewing

Member
Aug 17, 2022
135
232
76
This isn't true. Under heavy compute loads, Raptor Lake starts to really take long strides. This is from the Computerbase.de review and shows how the 13900K is significantly ahead of the 7950x in AV1 encoding, finishing the 4K60 transcoding test in 16% less time. I find this mightily impressive because AV1 encoding is very compute heavy and RL only has 8 big cores which are shouldering the vast majority of the workload.

I think average workloads show less discrepancy because there is less ILP in the code. But when there's enough ILP RPL will pull ahead courtesy of its wider architecture and higher throughput.

Not sure about this benchmark, it is using Quick Sync? If not, most likely it is using AVX128. Otherwise E-Cores will be beaten badly by 16 AVX256 Zen3/4 cores.

Having said that, I am not denying there can be outlier like this. But looking at the result, it doesn't seem correct. It may have "Intel Optimization".

1669280823047.png
 
  • Like
Reactions: Tlh97 and scineram

Thunder 57

Platinum Member
Aug 19, 2007
2,674
3,796
136
This isn't true. Under heavy compute loads, Raptor Lake starts to really take long strides. This is from the Computerbase.de review and shows how the 13900K is significantly ahead of the 7950x in AV1 encoding, finishing the 4K60 transcoding test in 16% less time. I find this mightily impressive because AV1 encoding is very compute heavy and RL only has 8 big cores which are shouldering the vast majority of the workload.

I think average workloads show less discrepancy because there is less ILP in the code. But when there's enough ILP RPL will pull ahead courtesy of its wider architecture and higher throughput.

kIiG4O.png

Anybody can cherry pick. If anything I expect Zen 3/4 to do better in h.265 and AV1 since they use AVX much more heavily than h.264/AVC. I can speculate as well.

I'm talking about IPC, not overall performance. The rumors before Zen 4 launched were that it would have brutal IPC increase of 25% and up.

Probably because nearly the entire forum is an AMD echo chamber. :D And no house is being burned down with RL, lets not get carried away here.

I just down-volted mine some more and it's running at 77c package power temp at 5.2ghz on air cooling.

Nice straw man.

I aim to please. And lets be honest, one sided views are typically boring and don't keep the forum alive. Conflict and dissent are the bread and butter of any forum.

Agreed. Thank you for providing a different viewpoint. No one (should) want to live in an echo chamber. Too much of that going on lately.

I am all for competition, without AMD, Intel would not release AlderLake i3 at such a low price. However, I missed those days when AMD cut the prices of CPU by half even before Intel released their latest CPU (was it Conroe?). AMD nowsday is slow to react the threat by Intel which is irony. :D

Intel going to announce 13th i5 desktop CPU at sub$200 market. And this time they are bundling 4 E cores into 6 P cores (Should be Alder Lake dies), I am now wondering how much AMD going to charge the upcoming 7600 with 6 cores only...:cool:

Maybe they cut the prices of the Netburst CPU's in half because Conroe beat the snot out of it in every well. Did a good job on AMD as well outside of server. Intel really needed Nehalem to win that one back.

Also, the upcoming 13th gen from what I have read is all just a refresh of Alder Lake. It would be nice to be wrong, but we shall see.
 
  • Like
Reactions: Tlh97

Abwx

Lifer
Apr 2, 2011
10,940
3,441
136
This isn't true. Under heavy compute loads, Raptor Lake starts to really take long strides. This is from the Computerbase.de review and shows how the 13900K is significantly ahead of the 7950x in AV1 encoding, finishing the 4K60 transcoding test in 16% less time. I find this mightily impressive because AV1 encoding is very compute heavy and RL only has 8 big cores which are shouldering the vast majority of the workload.

I think average workloads show less discrepancy because there is less ILP in the code. But when there's enough ILP RPL will pull ahead courtesy of its wider architecture and higher throughput.

kIiG4O.png

The numbers for AV1 are dubbious at best and there s likely GPU acceleration like in Adobe Premiere or a dedicated hardware block , in their regular Handbrake test Zen 4 is 13% ahead at stock settings, how could RL suddenly be that better with the same app and another video format.?

 
  • Like
Reactions: Tlh97 and scineram

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Not sure about this benchmark, it is using Quick Sync? If not, most likely it is using AVX128. Otherwise E-Cores will be beaten badly by 16 AVX256 Zen3/4 cores.

Having said that, I am not denying there can be outlier like this. But looking at the result, it doesn't seem correct. It may have "Intel Optimization".

View attachment 71682

It's not using QuickSync, it's software. It says it there in the graph if you use the translate function. As far as I know, Quick Sync doesn't even support AV1 encoding. Intel's DG2 which is used in its Arc GPUs does support AV1 encoding howeverr.

The reason why Intel pulls ahead is as I said, because it has higher SIMD throughput and can do 3x 256 bit loads per cycle (plus it has the cache bandwidth to support it) while Zen 4 does 2x 256 bit loads per cycle. Zen 4 and Zen 3 have the same AVX throughput.

The 13700K is faster than the 12900KS due to having more cache bandwidth and better prefetch.

I can't believe you guys are surprised at this. If you read any of the architectural analysis for Golden Cove and Zen 4, you'd know exactly why this benchmark is the way it is.
 
  • Haha
Reactions: lobz

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Anybody can cherry pick. If anything I expect Zen 3/4 to do better in h.265 and AV1 since they use AVX much more heavily than h.264/AVC. I can speculate as well.

But why would you expect them to do better when they can only do 2x 256 bit loads per cycle while RPL can do 3x 256 bit loads per cycle and sustain it. That's why RPL is so fast in HEVC and especially AV1.

AV1 uses SIMD optimization much heavier than H.265 as its a newer standard.

Agreed. Thank you for providing a different viewpoint. No one (should) want to live in an echo chamber. Too much of that going on lately.

Absolute truth right here, and not just talking about tech stuff either.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
The numbers for AV1 are dubbious at best and there s likely GPU acceleration like in Adobe Premiere or a dedicated hardware block , in their regular Handbrake test Zen 4 is 13% ahead at stock settings, how could RL suddenly be that better with the same app and another video format.?

Intel's Quick Sync does not support AV1 encoding. I checked.

The reason why AV1 performance is better on RPL is because AV1 leans heavily on SIMD throughput, for which RPL has an advantage. RPL can do 3x 256 bit loads per cycle while Zen 4 can only do 2x 256 bit loads per cycle. RPL can also sustain that throughput because the cache bandwidths are much greater than Zen 4 and even Alder Lake on a per core basis.

Any kind of workload where there is high amounts of ILP, Raptor Lake generally pulls ahead because the architecture is wider and has more throughput.
 

coercitiv

Diamond Member
Jan 24, 2014
6,187
11,858
136
The reason why AV1 performance is better on RPL is because AV1 leans heavily on SIMD throughput, for which RPL has an advantage. RPL can do 3x 256 bit loads per cycle while Zen 4 can only do 2x 256 bit loads per cycle. RPL can also sustain that throughput because the cache bandwidths are much greater than Zen 4 and even Alder Lake on a per core basis.
https://openbenchmarking.org/test/pts/svt-av1

Testing done suing the SVT-AV1 open source encoder originally developed by Intel. Take your time, go through the various tests using version 2.6 and different encoding profiles and input resolutions. On average the 13900K is equal to 7950X, if not slightly behind.
 
  • Like
Reactions: lightmanek

JustViewing

Member
Aug 17, 2022
135
232
76
It's not using QuickSync, it's software. It says it there in the graph if you use the translate function. As far as I know, Quick Sync doesn't even support AV1 encoding. Intel's DG2 which is used in its Arc GPUs does support AV1 encoding howeverr.

The reason why Intel pulls ahead is as I said, because it has higher SIMD throughput and can do 3x 256 bit loads per cycle (plus it has the cache bandwidth to support it) while Zen 4 does 2x 256 bit loads per cycle. Zen 4 and Zen 3 have the same AVX throughput.

The 13700K is faster than the 12900KS due to having more cache bandwidth and better prefetch.

I can't believe you guys are surprised at this. If you read any of the architectural analysis for Golden Cove and Zen 4, you'd know exactly why this benchmark is the way it is.

So is the encoding happens at the rate of 3 x 32B x 5Ghz = 480GB/s ? If not, execution pipeline will be the bottle neck, not the read bandwidth. Intermediate values are always stored in registers. For all core, Is it 480 *24 GB/s ?

Also if it is a multi threaded test, E-Cores with its 128 AVX can't touch Zen3/4 cores.

QuickSync maybe used for decoding in this test.
 
  • Like
Reactions: scineram

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
https://openbenchmarking.org/test/pts/svt-av1

Testing done suing the SVT-AV1 open source encoder originally developed by Intel. Take your time, go through the various tests using version 2.6 and different encoding profiles and input resolutions. On average the 13900K is equal to 7950X, if not slightly behind.

I agree with your assessment, and the reason is because Intel's SVT-AV1 encoder is the fastest software encoder and is particularly tuned for higher thread count CPUs. That's why Zen 4 is on average a bit quicker. Plus, I think SVT-AV1 can also use AVX-512 enhancements as well.

I don't know which AV1 encoder Handbrake is using.

However, that still doesn't negate my point. Raptor Lake is disproportionately good in these workloads thanks to its very high cache bandwidth which can sustain 3x 256 bit loads per cycle. Zen 4 has up to twice the big core count and by all means would be way out in front if it weren't for the cache bandwidth and execution advantage that RPL possesses.

Which goes back to my original assertion. Intel engineers aren't stupid. Some of you guys think Raptor Lake's die is inflated and bigger just because..... But it's clear that Intel was pushing high throughput in Golden Cove and gave the architecture the cache bandwidth and execution units necessary to sustain those kind of workloads at high speed.

Whenever Sapphire Rapids launches, it will be even more impressive because it can do 2x 512 bit loads per cycle. Zen 4's only hope will be to win on sheer core count advantage, which it should have.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136

I'm guessing that the Quick Sync marketing term is still used for Intel Arc GPU's hardware accelerated encoding. Because there was quite a bit of fanfare about the Intel Arc GPUs being the World's first GPUs to support hardware accelerated AV1 encoding.

Intel Arc are the first GPUs with full support for AV1 - HardwarEsfera

In any case, even if by some miracle the Intel iGPU in RPL was able to do hardware AV1 encoding, it's very easy for the reviewers to turn that feature off. And they did stipulate in the graph that it was in software mode.
 

Abwx

Lifer
Apr 2, 2011
10,940
3,441
136
I'm guessing that the Quick Sync marketing term is still used for Intel Arc GPU's hardware accelerated encoding. Because there was quite a bit of fanfare about the Intel Arc GPUs being the World's first GPUs to support hardware accelerated AV1 encoding.

Intel Arc are the first GPUs with full support for AV1 - HardwarEsfera

In any case, even if by some miracle the Intel iGPU in RPL was able to do hardware AV1 encoding, it's very easy for the reviewers to turn that feature off. And they did stipulate in the graph that it was in software mode.


Dunno what can be turned off but since you brought the issue there s the same Handbrake tests in the 7950X review.

To encode H264 to H265 at 2160p the 7950X is 60% faster than the 12900K but once it s H264 to AVI 2160p the 7950X advantage shrink to 18%, it s obvious that the CPU real perf has nothing to do with such a discrepancy.

If the software support Intel s QSV, and that s the case for Handbrake, then it will be enabled even when it s supposedly sofware only encoding...

 
  • Like
Reactions: scineram

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
So is the encoding happens at the rate of 3 x 32B x 5Ghz = 480GB/s ? If not, execution pipeline will be the bottle neck, not the read bandwidth. Intermediate values are always stored in registers. For all core, Is it 480 *24 GB/s ?

Also if it is a multi threaded test, E-Cores with its 128 AVX can't touch Zen3/4 cores.

Do you think Intel engineers would be dumb enough to give the CPUs 3x 256 bit loads per cycle but not be able to execute on it? Come on now. AFAIK, none of us here are actual CPU engineers but many of us like to pretend that we can criticize a CPU's architecture with due diligence. Raptor Cove is already known to be a very wide, 6 issue CPU with massive OoO resources......more than any x86-64 CPU. So I don't get why you are so surprised it's faster than Zen 4 in this type of high ILP workload core for core.

As for the E cores, they are just barely chipping in for this kind of workload so it doesn't really matter.

QuickSync maybe used for decoding in this test.

Quick Sync is just a marketing term, so I think it applies to the discrete Intel Arc GPUs and not just the integrated GPUs. But as of now, the only GPUs that have hardware accelerated AV1 encoding are Intel Arc and Nvidia RTX 4000 series. RDNA3 should get it as well.

Meteor Lake will have hardware acceleration for AV1 encoding using the iGPU.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Dunno what can be turned off but since you brought the issue there s the same Handbrake tests in the 7950X review.

To encode H264 to H265 at 2160p the 7950X is 60% faster than the 12900K but once it s H264 to AVI 2160p the 7950X advantage shrink to 18%, it s obvious that the CPU real perf has nothing to do with such a discrepancy.

You guys are truly stubborn. I told you that AV1 relies on SIMD performance heavily and ADL and RPL excel in that area. There's no other explanation. Other reviews show the same pattern with AV1 encoding. Look at Tomshardware's 13th gen review:

g6FRfNuYaFUuassR9fbYea-970-80.png.webp


dL9t9RAvg3Q8idFVEW2Uka-970-80.png.webp


If the software support Intel s QSV, and that s the case for Handbrake, then it will be enabled even when it s supposedly sofware only encoding...


It wouldn't matter if it was enabled anyway because none of the iGPUs can do hardware accelerated AV1 encoding. The only GPUs that can do AV1 encoding in hardware are the Intel Arc, Nvidia RTX 4000 series and likely the upcoming RDNA 3.

Meteor Lake should also have AV1 encoding in hardware.
 
  • Like
Reactions: Hulk

Abwx

Lifer
Apr 2, 2011
10,940
3,441
136
You guys are truly stubborn. I told you that AV1 relies on SIMD performance heavily and ADL and RPL excel in that area. There's no other explanation. Other reviews show the same pattern with AV1 encoding. Look at Tomshardware's 13th gen review:

g6FRfNuYaFUuassR9fbYea-970-80.png.webp


dL9t9RAvg3Q8idFVEW2Uka-970-80.png.webp




It wouldn't matter if it was enabled anyway because none of the iGPUs can do hardware accelerated AV1 encoding. The only GPUs that can do AV1 encoding in hardware are the Intel Arc, Nvidia RTX 4000 series and likely the upcoming RDNA 3.

Meteor Lake should also have AV1 encoding in hardware.


With this SVT-AV1 encoder 7950X is barely faster than a 5950X and both are well below a 12900K and you keep saying that it s due to CPU real perf...


So you basically use an Intel designed encoder that show about no improvement from 5950X to 7950X to claim better perf for RL, have you any other such half legged tests..?.

Besides 7950X has 60% better SIMD throughput than the 12900K according to Sisoft s Sandra, so much for us not knowing about CPUs uarch.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Do you think Intel engineers would be dumb enough to give the CPUs 3x 256 bit loads per cycle but not be able to execute on it? Come on now. AFAIK, none of us here are actual CPU engineers but many of us like to pretend that we can criticize a CPU's architecture with due diligence. Raptor Cove is already known to be a very wide, 6 issue CPU with massive OoO resources......more than any x86-64 CPU. So I don't get why you are so surprised it's faster than Zen 4 in this type of high ILP workload core for core.

As for the E cores, they are just barely chipping in for this kind of workload so it doesn't really matter.



Quick Sync is just a marketing term, so I think it applies to the discrete Intel Arc GPUs and not just the integrated GPUs. But as of now, the only GPUs that have hardware accelerated AV1 encoding are Intel Arc and Nvidia RTX 4000 series. RDNA3 should get it as well.

Meteor Lake will have hardware acceleration for AV1 encoding using the iGPU.
So engineers can't screw up? Speaking as one, that is one of the dumbest things written and using it in an argument to support your view is transparently dishonest.

Look around you, any mistakes/screwups apparent?
 

JustViewing

Member
Aug 17, 2022
135
232
76
Do you think Intel engineers would be dumb enough to give the CPUs 3x 256 bit loads per cycle but not be able to execute on it? Come on now. AFAIK, none of us here are actual CPU engineers but many of us like to pretend that we can criticize a CPU's architecture with due diligence. Raptor Cove is already known to be a very wide, 6 issue CPU with massive OoO resources......more than any x86-64 CPU. So I don't get why you are so surprised it's faster than Zen 4 in this type of high ILP workload core for core.

As for the E cores, they are just barely chipping in for this kind of workload so it doesn't really matter.



Quick Sync is just a marketing term, so I think it applies to the discrete Intel Arc GPUs and not just the integrated GPUs. But as of now, the only GPUs that have hardware accelerated AV1 encoding are Intel Arc and Nvidia RTX 4000 series. RDNA3 should get it as well.

Meteor Lake will have hardware acceleration for AV1 encoding using the iGPU.

I do program a lot in x64 Assembly as hobby including using AVX256. Load/store bandwidth hardly the limiting factor as Zen can do 2 loads and 1 store per cycle. It only matters for basic streaming type workload (like take 2 values, add together and save it). Zen4 also is 6 issue from uop-cache and it has a big uop-cache. Most of the critical loop will run from uop-cache. Therefore, having 6 wide decode is not a big advantage.
 

Hulk

Diamond Member
Oct 9, 1999
4,214
2,006
136
So engineers can't screw up? Speaking as one, that is one of the dumbest things written and using it in an argument to support your view is transparently dishonest.

Look around you, any mistakes/screwups apparent?

I think his point isn't that engineers don't make mistakes, I'm one as well (ME, I don't assert to be an expert on microprocessor architecture) and we do. But it's one thing for a single engineer to have a design oversight while designing a hydraulics/pump system in North Jersey where a pump is oversized and the project cost ends up being more than it should have been. Yes, that is wasted resources on one small project.

With a CPU we are talking about tens of engineers/designers/architects going over the design for sometimes years, using simulations, taking prior designs/performance into account, and agonizing over every square mm of die space because millions of these parts will be produced. As additional engineers are added to a project I think it's safe to assume the prospect of significant design flaws is reduced. So I think the point is that is is very unlikely that a feature as important as the one being discussed ended up being a "screw up." Unlikely but not impossible I will admit to that!

As is usually the case it's hard to insert the nuance into these discussions without actually speaking in person.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
I think his point isn't that engineers don't make mistakes, I'm one as well (ME, I don't assert to be an expert on microprocessor architecture) and we do. But it's one thing for a single engineer to have a design oversight while designing a hydraulics/pump system in North Jersey where a pump is oversized and the project cost ends up being more than it should have been. Yes, that is wasted resources on one small project.

With a CPU we are talking about tens of engineers/designers/architects going over the design for sometimes years, using simulations, taking prior designs/performance into account, and agonizing over every square mm of die space because millions of these parts will be produced. As additional engineers are added to a project I think it's safe to assume the prospect of significant design flaws is reduced. So I think the point is that is is very unlikely that a feature as important as the one being discussed ended up being a "screw up." Unlikely but not impossible I will admit to that!

As is usually the case it's hard to insert the nuance into these discussions without actually speaking in person.
One example in the CPU space that directly contradicts this reasoning is the Bulldozer design with CMT. AMD was a very successful company at the time with top class design teams, yet it happened. The wrong path taken with disastrous consequences. There are more.
 
  • Like
Reactions: Tlh97 and KompuKare

Hitman928

Diamond Member
Apr 15, 2012
5,245
7,793
136
I think his point isn't that engineers don't make mistakes, I'm one as well (ME, I don't assert to be an expert on microprocessor architecture) and we do. But it's one thing for a single engineer to have a design oversight while designing a hydraulics/pump system in North Jersey where a pump is oversized and the project cost ends up being more than it should have been. Yes, that is wasted resources on one small project.

With a CPU we are talking about tens of engineers/designers/architects going over the design for sometimes years, using simulations, taking prior designs/performance into account, and agonizing over every square mm of die space because millions of these parts will be produced. As additional engineers are added to a project I think it's safe to assume the prospect of significant design flaws is reduced. So I think the point is that is is very unlikely that a feature as important as the one being discussed ended up being a "screw up." Unlikely but not impossible I will admit to that!

As is usually the case it's hard to insert the nuance into these discussions without actually speaking in person.

Williamette and Bulldozer show just how bad even groups of engineers can screw up. Usually it’s caused by or compounded by bad managerial pressure.
 

maddie

Diamond Member
Jul 18, 2010
4,738
4,667
136
Williamette and Bulldozer show just how bad even groups of engineers can screw up. Usually it’s caused by or compounded by bad managerial pressure.
True, but.

Ever heard the saying that it takes very intelligent people to screw up mightily. A strong willed individual or group with solid, but wrong arguments, can do a lot of harm. We're living it.
 

Schmide

Diamond Member
Mar 7, 2002
5,586
718
126
I would say the extra e-cores do a lot to help intel's performance, much more than any cache/memory bandwidth.

At the core of any modern compression algorithm is a FFT. ( AV1 is a 32x32 2d FFT)

The memory used in this operation is 8k which easily fits in the L1 cache. (probably double as in place FFT implementations do not lend well to vectorized FFTs) In terms of unoptimized operations, a FFT of 32 has O(N log2 N) = 32 * 5 = 160 complexity. Expanding out to the 2d operation where each row and column is operated on, giving 160 * 64 = 10240 = ~10k complexity. Depending on the compression level this can be reduced by a least a quarter. (no need to finalize the edge bins). SIMD operations further reduce the number of operations but require the data be transposed adding back in some complexity. AVX and SSE help a fair amount, but probably only 2-3x performance gains over non-vectorized implementations.

This is a very computational bottleneck where load stores represent a tiny fraction of the total operations. Best case #loads (64-96) / #operations (10k) = 0.0064 - 0.0096.
 
Last edited:

DAPUNISHER

Super Moderator CPU Forum Mod and Elite Member
Super Moderator
Aug 22, 2001
28,452
20,463
146
While this has tuned into a good popcorn thread, it is way off the reservation.

The premise of the thread, right there in the title, is based on outdated pricing, AKA fake news. The current pricing on Zen 4 CPUs invalidates the original assertions that were made. The rest of the thread is retreading ground we have covered many times. Accompanied by all the usual logical fallacies the desperate resort to when losing the debate. The wheel keeps turning.
 

JustViewing

Member
Aug 17, 2022
135
232
76
I would say the extra e-cores do a lot to help intel's performance, much more than any cache/memory bandwidth.

At the core of any modern compression algorithm is a FFT. ( AV1 is a 32x32 2d FFT)

The memory used in this operation is 4k which easily fits in the L1 cache. (probably double as in place FFT implementations do not lend well to vectorized FFTs) In terms of unoptimized operations, a FFT of 32 has O(N log2 N) = 32 * 5 = 160 complexity. Expanding out to the 2d operation where each row and column is operated on, giving 160 * 64 = 10240 = ~10k complexity. Depending on the compression level this can be reduced by a least a quarter. (no need to finalize the edge bins). SIMD operations further reduce the number of operations but require the data be transposed adding back in some complexity. AVX and SSE help a fair amount, but probably only 2-3x performance gains over non-vectorized implementations.

This is a very computational bottleneck where load stores represent a tiny fraction of the total operations. Best case #loads (64-96) / #operations (10k) = 0.0064 - 0.0096.

For FFT if the dataset is arranged in favorable way, you can saturate the AVX execution units to the fullest. You can achieve this by interleaving multiple streams of FFT into single processing section. I have done this in the past.
 
  • Like
Reactions: Schmide