Question How in the world has AMD got the Ryzen 7600X and 7700X priced same when they are inferior even in P cores only compared to 13600K and 13700K

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

JustViewing

Member
Aug 17, 2022
52
111
66
This isn't true. Under heavy compute loads, Raptor Lake starts to really take long strides. This is from the Computerbase.de review and shows how the 13900K is significantly ahead of the 7950x in AV1 encoding, finishing the 4K60 transcoding test in 16% less time. I find this mightily impressive because AV1 encoding is very compute heavy and RL only has 8 big cores which are shouldering the vast majority of the workload.

I think average workloads show less discrepancy because there is less ILP in the code. But when there's enough ILP RPL will pull ahead courtesy of its wider architecture and higher throughput.
Not sure about this benchmark, it is using Quick Sync? If not, most likely it is using AVX128. Otherwise E-Cores will be beaten badly by 16 AVX256 Zen3/4 cores.

Having said that, I am not denying there can be outlier like this. But looking at the result, it doesn't seem correct. It may have "Intel Optimization".

1669280823047.png
 
  • Like
Reactions: Tlh97 and scineram

Thunder 57

Platinum Member
Aug 19, 2007
2,014
2,523
136
This isn't true. Under heavy compute loads, Raptor Lake starts to really take long strides. This is from the Computerbase.de review and shows how the 13900K is significantly ahead of the 7950x in AV1 encoding, finishing the 4K60 transcoding test in 16% less time. I find this mightily impressive because AV1 encoding is very compute heavy and RL only has 8 big cores which are shouldering the vast majority of the workload.

I think average workloads show less discrepancy because there is less ILP in the code. But when there's enough ILP RPL will pull ahead courtesy of its wider architecture and higher throughput.

Anybody can cherry pick. If anything I expect Zen 3/4 to do better in h.265 and AV1 since they use AVX much more heavily than h.264/AVC. I can speculate as well.

I'm talking about IPC, not overall performance. The rumors before Zen 4 launched were that it would have brutal IPC increase of 25% and up.

Probably because nearly the entire forum is an AMD echo chamber. :D And no house is being burned down with RL, lets not get carried away here.

I just down-volted mine some more and it's running at 77c package power temp at 5.2ghz on air cooling.
Nice straw man.

I aim to please. And lets be honest, one sided views are typically boring and don't keep the forum alive. Conflict and dissent are the bread and butter of any forum.
Agreed. Thank you for providing a different viewpoint. No one (should) want to live in an echo chamber. Too much of that going on lately.

I am all for competition, without AMD, Intel would not release AlderLake i3 at such a low price. However, I missed those days when AMD cut the prices of CPU by half even before Intel released their latest CPU (was it Conroe?). AMD nowsday is slow to react the threat by Intel which is irony. :D

Intel going to announce 13th i5 desktop CPU at sub$200 market. And this time they are bundling 4 E cores into 6 P cores (Should be Alder Lake dies), I am now wondering how much AMD going to charge the upcoming 7600 with 6 cores only...:cool:
Maybe they cut the prices of the Netburst CPU's in half because Conroe beat the snot out of it in every well. Did a good job on AMD as well outside of server. Intel really needed Nehalem to win that one back.

Also, the upcoming 13th gen from what I have read is all just a refresh of Alder Lake. It would be nice to be wrong, but we shall see.
 
  • Like
Reactions: Tlh97

Abwx

Diamond Member
Apr 2, 2011
9,814
2,250
136
This isn't true. Under heavy compute loads, Raptor Lake starts to really take long strides. This is from the Computerbase.de review and shows how the 13900K is significantly ahead of the 7950x in AV1 encoding, finishing the 4K60 transcoding test in 16% less time. I find this mightily impressive because AV1 encoding is very compute heavy and RL only has 8 big cores which are shouldering the vast majority of the workload.

I think average workloads show less discrepancy because there is less ILP in the code. But when there's enough ILP RPL will pull ahead courtesy of its wider architecture and higher throughput.

The numbers for AV1 are dubbious at best and there s likely GPU acceleration like in Adobe Premiere or a dedicated hardware block , in their regular Handbrake test Zen 4 is 13% ahead at stock settings, how could RL suddenly be that better with the same app and another video format.?

 
  • Like
Reactions: Tlh97 and scineram

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
Not sure about this benchmark, it is using Quick Sync? If not, most likely it is using AVX128. Otherwise E-Cores will be beaten badly by 16 AVX256 Zen3/4 cores.

Having said that, I am not denying there can be outlier like this. But looking at the result, it doesn't seem correct. It may have "Intel Optimization".

View attachment 71682
It's not using QuickSync, it's software. It says it there in the graph if you use the translate function. As far as I know, Quick Sync doesn't even support AV1 encoding. Intel's DG2 which is used in its Arc GPUs does support AV1 encoding howeverr.

The reason why Intel pulls ahead is as I said, because it has higher SIMD throughput and can do 3x 256 bit loads per cycle (plus it has the cache bandwidth to support it) while Zen 4 does 2x 256 bit loads per cycle. Zen 4 and Zen 3 have the same AVX throughput.

The 13700K is faster than the 12900KS due to having more cache bandwidth and better prefetch.

I can't believe you guys are surprised at this. If you read any of the architectural analysis for Golden Cove and Zen 4, you'd know exactly why this benchmark is the way it is.
 
  • Haha
Reactions: lobz

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
Anybody can cherry pick. If anything I expect Zen 3/4 to do better in h.265 and AV1 since they use AVX much more heavily than h.264/AVC. I can speculate as well.
But why would you expect them to do better when they can only do 2x 256 bit loads per cycle while RPL can do 3x 256 bit loads per cycle and sustain it. That's why RPL is so fast in HEVC and especially AV1.

AV1 uses SIMD optimization much heavier than H.265 as its a newer standard.

Agreed. Thank you for providing a different viewpoint. No one (should) want to live in an echo chamber. Too much of that going on lately.
Absolute truth right here, and not just talking about tech stuff either.
 

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
The numbers for AV1 are dubbious at best and there s likely GPU acceleration like in Adobe Premiere or a dedicated hardware block , in their regular Handbrake test Zen 4 is 13% ahead at stock settings, how could RL suddenly be that better with the same app and another video format.?
Intel's Quick Sync does not support AV1 encoding. I checked.

The reason why AV1 performance is better on RPL is because AV1 leans heavily on SIMD throughput, for which RPL has an advantage. RPL can do 3x 256 bit loads per cycle while Zen 4 can only do 2x 256 bit loads per cycle. RPL can also sustain that throughput because the cache bandwidths are much greater than Zen 4 and even Alder Lake on a per core basis.

Any kind of workload where there is high amounts of ILP, Raptor Lake generally pulls ahead because the architecture is wider and has more throughput.
 

coercitiv

Diamond Member
Jan 24, 2014
5,260
8,603
136
The reason why AV1 performance is better on RPL is because AV1 leans heavily on SIMD throughput, for which RPL has an advantage. RPL can do 3x 256 bit loads per cycle while Zen 4 can only do 2x 256 bit loads per cycle. RPL can also sustain that throughput because the cache bandwidths are much greater than Zen 4 and even Alder Lake on a per core basis.
https://openbenchmarking.org/test/pts/svt-av1

Testing done suing the SVT-AV1 open source encoder originally developed by Intel. Take your time, go through the various tests using version 2.6 and different encoding profiles and input resolutions. On average the 13900K is equal to 7950X, if not slightly behind.
 

JustViewing

Member
Aug 17, 2022
52
111
66
It's not using QuickSync, it's software. It says it there in the graph if you use the translate function. As far as I know, Quick Sync doesn't even support AV1 encoding. Intel's DG2 which is used in its Arc GPUs does support AV1 encoding howeverr.

The reason why Intel pulls ahead is as I said, because it has higher SIMD throughput and can do 3x 256 bit loads per cycle (plus it has the cache bandwidth to support it) while Zen 4 does 2x 256 bit loads per cycle. Zen 4 and Zen 3 have the same AVX throughput.

The 13700K is faster than the 12900KS due to having more cache bandwidth and better prefetch.

I can't believe you guys are surprised at this. If you read any of the architectural analysis for Golden Cove and Zen 4, you'd know exactly why this benchmark is the way it is.
So is the encoding happens at the rate of 3 x 32B x 5Ghz = 480GB/s ? If not, execution pipeline will be the bottle neck, not the read bandwidth. Intermediate values are always stored in registers. For all core, Is it 480 *24 GB/s ?

Also if it is a multi threaded test, E-Cores with its 128 AVX can't touch Zen3/4 cores.

QuickSync maybe used for decoding in this test.
 
  • Like
Reactions: scineram

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
https://openbenchmarking.org/test/pts/svt-av1

Testing done suing the SVT-AV1 open source encoder originally developed by Intel. Take your time, go through the various tests using version 2.6 and different encoding profiles and input resolutions. On average the 13900K is equal to 7950X, if not slightly behind.
I agree with your assessment, and the reason is because Intel's SVT-AV1 encoder is the fastest software encoder and is particularly tuned for higher thread count CPUs. That's why Zen 4 is on average a bit quicker. Plus, I think SVT-AV1 can also use AVX-512 enhancements as well.

I don't know which AV1 encoder Handbrake is using.

However, that still doesn't negate my point. Raptor Lake is disproportionately good in these workloads thanks to its very high cache bandwidth which can sustain 3x 256 bit loads per cycle. Zen 4 has up to twice the big core count and by all means would be way out in front if it weren't for the cache bandwidth and execution advantage that RPL possesses.

Which goes back to my original assertion. Intel engineers aren't stupid. Some of you guys think Raptor Lake's die is inflated and bigger just because..... But it's clear that Intel was pushing high throughput in Golden Cove and gave the architecture the cache bandwidth and execution units necessary to sustain those kind of workloads at high speed.

Whenever Sapphire Rapids launches, it will be even more impressive because it can do 2x 512 bit loads per cycle. Zen 4's only hope will be to win on sheer core count advantage, which it should have.
 

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
I'm guessing that the Quick Sync marketing term is still used for Intel Arc GPU's hardware accelerated encoding. Because there was quite a bit of fanfare about the Intel Arc GPUs being the World's first GPUs to support hardware accelerated AV1 encoding.

Intel Arc are the first GPUs with full support for AV1 - HardwarEsfera

In any case, even if by some miracle the Intel iGPU in RPL was able to do hardware AV1 encoding, it's very easy for the reviewers to turn that feature off. And they did stipulate in the graph that it was in software mode.
 

Abwx

Diamond Member
Apr 2, 2011
9,814
2,250
136
I'm guessing that the Quick Sync marketing term is still used for Intel Arc GPU's hardware accelerated encoding. Because there was quite a bit of fanfare about the Intel Arc GPUs being the World's first GPUs to support hardware accelerated AV1 encoding.

Intel Arc are the first GPUs with full support for AV1 - HardwarEsfera

In any case, even if by some miracle the Intel iGPU in RPL was able to do hardware AV1 encoding, it's very easy for the reviewers to turn that feature off. And they did stipulate in the graph that it was in software mode.

Dunno what can be turned off but since you brought the issue there s the same Handbrake tests in the 7950X review.

To encode H264 to H265 at 2160p the 7950X is 60% faster than the 12900K but once it s H264 to AVI 2160p the 7950X advantage shrink to 18%, it s obvious that the CPU real perf has nothing to do with such a discrepancy.

If the software support Intel s QSV, and that s the case for Handbrake, then it will be enabled even when it s supposedly sofware only encoding...

 
  • Like
Reactions: scineram

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
So is the encoding happens at the rate of 3 x 32B x 5Ghz = 480GB/s ? If not, execution pipeline will be the bottle neck, not the read bandwidth. Intermediate values are always stored in registers. For all core, Is it 480 *24 GB/s ?

Also if it is a multi threaded test, E-Cores with its 128 AVX can't touch Zen3/4 cores.
Do you think Intel engineers would be dumb enough to give the CPUs 3x 256 bit loads per cycle but not be able to execute on it? Come on now. AFAIK, none of us here are actual CPU engineers but many of us like to pretend that we can criticize a CPU's architecture with due diligence. Raptor Cove is already known to be a very wide, 6 issue CPU with massive OoO resources......more than any x86-64 CPU. So I don't get why you are so surprised it's faster than Zen 4 in this type of high ILP workload core for core.

As for the E cores, they are just barely chipping in for this kind of workload so it doesn't really matter.

QuickSync maybe used for decoding in this test.
Quick Sync is just a marketing term, so I think it applies to the discrete Intel Arc GPUs and not just the integrated GPUs. But as of now, the only GPUs that have hardware accelerated AV1 encoding are Intel Arc and Nvidia RTX 4000 series. RDNA3 should get it as well.

Meteor Lake will have hardware acceleration for AV1 encoding using the iGPU.
 

Carfax83

Diamond Member
Nov 1, 2010
6,515
1,300
126
Dunno what can be turned off but since you brought the issue there s the same Handbrake tests in the 7950X review.

To encode H264 to H265 at 2160p the 7950X is 60% faster than the 12900K but once it s H264 to AVI 2160p the 7950X advantage shrink to 18%, it s obvious that the CPU real perf has nothing to do with such a discrepancy.
You guys are truly stubborn. I told you that AV1 relies on SIMD performance heavily and ADL and RPL excel in that area. There's no other explanation. Other reviews show the same pattern with AV1 encoding. Look at Tomshardware's 13th gen review:





If the software support Intel s QSV, and that s the case for Handbrake, then it will be enabled even when it s supposedly sofware only encoding...

It wouldn't matter if it was enabled anyway because none of the iGPUs can do hardware accelerated AV1 encoding. The only GPUs that can do AV1 encoding in hardware are the Intel Arc, Nvidia RTX 4000 series and likely the upcoming RDNA 3.

Meteor Lake should also have AV1 encoding in hardware.
 
  • Like
Reactions: Hulk

Abwx

Diamond Member
Apr 2, 2011
9,814
2,250
136
You guys are truly stubborn. I told you that AV1 relies on SIMD performance heavily and ADL and RPL excel in that area. There's no other explanation. Other reviews show the same pattern with AV1 encoding. Look at Tomshardware's 13th gen review:







It wouldn't matter if it was enabled anyway because none of the iGPUs can do hardware accelerated AV1 encoding. The only GPUs that can do AV1 encoding in hardware are the Intel Arc, Nvidia RTX 4000 series and likely the upcoming RDNA 3.

Meteor Lake should also have AV1 encoding in hardware.

With this SVT-AV1 encoder 7950X is barely faster than a 5950X and both are well below a 12900K and you keep saying that it s due to CPU real perf...


So you basically use an Intel designed encoder that show about no improvement from 5950X to 7950X to claim better perf for RL, have you any other such half legged tests..?.

Besides 7950X has 60% better SIMD throughput than the 12900K according to Sisoft s Sandra, so much for us not knowing about CPUs uarch.
 

maddie

Diamond Member
Jul 18, 2010
4,333
3,869
136
Do you think Intel engineers would be dumb enough to give the CPUs 3x 256 bit loads per cycle but not be able to execute on it? Come on now. AFAIK, none of us here are actual CPU engineers but many of us like to pretend that we can criticize a CPU's architecture with due diligence. Raptor Cove is already known to be a very wide, 6 issue CPU with massive OoO resources......more than any x86-64 CPU. So I don't get why you are so surprised it's faster than Zen 4 in this type of high ILP workload core for core.

As for the E cores, they are just barely chipping in for this kind of workload so it doesn't really matter.



Quick Sync is just a marketing term, so I think it applies to the discrete Intel Arc GPUs and not just the integrated GPUs. But as of now, the only GPUs that have hardware accelerated AV1 encoding are Intel Arc and Nvidia RTX 4000 series. RDNA3 should get it as well.

Meteor Lake will have hardware acceleration for AV1 encoding using the iGPU.
So engineers can't screw up? Speaking as one, that is one of the dumbest things written and using it in an argument to support your view is transparently dishonest.

Look around you, any mistakes/screwups apparent?
 

JustViewing

Member
Aug 17, 2022
52
111
66
Do you think Intel engineers would be dumb enough to give the CPUs 3x 256 bit loads per cycle but not be able to execute on it? Come on now. AFAIK, none of us here are actual CPU engineers but many of us like to pretend that we can criticize a CPU's architecture with due diligence. Raptor Cove is already known to be a very wide, 6 issue CPU with massive OoO resources......more than any x86-64 CPU. So I don't get why you are so surprised it's faster than Zen 4 in this type of high ILP workload core for core.

As for the E cores, they are just barely chipping in for this kind of workload so it doesn't really matter.



Quick Sync is just a marketing term, so I think it applies to the discrete Intel Arc GPUs and not just the integrated GPUs. But as of now, the only GPUs that have hardware accelerated AV1 encoding are Intel Arc and Nvidia RTX 4000 series. RDNA3 should get it as well.

Meteor Lake will have hardware acceleration for AV1 encoding using the iGPU.
I do program a lot in x64 Assembly as hobby including using AVX256. Load/store bandwidth hardly the limiting factor as Zen can do 2 loads and 1 store per cycle. It only matters for basic streaming type workload (like take 2 values, add together and save it). Zen4 also is 6 issue from uop-cache and it has a big uop-cache. Most of the critical loop will run from uop-cache. Therefore, having 6 wide decode is not a big advantage.
 
  • Like
Reactions: Tlh97 and scineram

Hulk

Diamond Member
Oct 9, 1999
3,571
1,171
136
So engineers can't screw up? Speaking as one, that is one of the dumbest things written and using it in an argument to support your view is transparently dishonest.

Look around you, any mistakes/screwups apparent?
I think his point isn't that engineers don't make mistakes, I'm one as well (ME, I don't assert to be an expert on microprocessor architecture) and we do. But it's one thing for a single engineer to have a design oversight while designing a hydraulics/pump system in North Jersey where a pump is oversized and the project cost ends up being more than it should have been. Yes, that is wasted resources on one small project.

With a CPU we are talking about tens of engineers/designers/architects going over the design for sometimes years, using simulations, taking prior designs/performance into account, and agonizing over every square mm of die space because millions of these parts will be produced. As additional engineers are added to a project I think it's safe to assume the prospect of significant design flaws is reduced. So I think the point is that is is very unlikely that a feature as important as the one being discussed ended up being a "screw up." Unlikely but not impossible I will admit to that!

As is usually the case it's hard to insert the nuance into these discussions without actually speaking in person.
 
  • Like
Reactions: Carfax83

maddie

Diamond Member
Jul 18, 2010
4,333
3,869
136
I think his point isn't that engineers don't make mistakes, I'm one as well (ME, I don't assert to be an expert on microprocessor architecture) and we do. But it's one thing for a single engineer to have a design oversight while designing a hydraulics/pump system in North Jersey where a pump is oversized and the project cost ends up being more than it should have been. Yes, that is wasted resources on one small project.

With a CPU we are talking about tens of engineers/designers/architects going over the design for sometimes years, using simulations, taking prior designs/performance into account, and agonizing over every square mm of die space because millions of these parts will be produced. As additional engineers are added to a project I think it's safe to assume the prospect of significant design flaws is reduced. So I think the point is that is is very unlikely that a feature as important as the one being discussed ended up being a "screw up." Unlikely but not impossible I will admit to that!

As is usually the case it's hard to insert the nuance into these discussions without actually speaking in person.
One example in the CPU space that directly contradicts this reasoning is the Bulldozer design with CMT. AMD was a very successful company at the time with top class design teams, yet it happened. The wrong path taken with disastrous consequences. There are more.
 
  • Like
Reactions: Tlh97 and KompuKare

Hitman928

Diamond Member
Apr 15, 2012
4,238
5,367
136
I think his point isn't that engineers don't make mistakes, I'm one as well (ME, I don't assert to be an expert on microprocessor architecture) and we do. But it's one thing for a single engineer to have a design oversight while designing a hydraulics/pump system in North Jersey where a pump is oversized and the project cost ends up being more than it should have been. Yes, that is wasted resources on one small project.

With a CPU we are talking about tens of engineers/designers/architects going over the design for sometimes years, using simulations, taking prior designs/performance into account, and agonizing over every square mm of die space because millions of these parts will be produced. As additional engineers are added to a project I think it's safe to assume the prospect of significant design flaws is reduced. So I think the point is that is is very unlikely that a feature as important as the one being discussed ended up being a "screw up." Unlikely but not impossible I will admit to that!

As is usually the case it's hard to insert the nuance into these discussions without actually speaking in person.
Williamette and Bulldozer show just how bad even groups of engineers can screw up. Usually it’s caused by or compounded by bad managerial pressure.
 

maddie

Diamond Member
Jul 18, 2010
4,333
3,869
136
Williamette and Bulldozer show just how bad even groups of engineers can screw up. Usually it’s caused by or compounded by bad managerial pressure.
True, but.

Ever heard the saying that it takes very intelligent people to screw up mightily. A strong willed individual or group with solid, but wrong arguments, can do a lot of harm. We're living it.
 

Schmide

Diamond Member
Mar 7, 2002
5,495
509
126
I would say the extra e-cores do a lot to help intel's performance, much more than any cache/memory bandwidth.

At the core of any modern compression algorithm is a FFT. ( AV1 is a 32x32 2d FFT)

The memory used in this operation is 8k which easily fits in the L1 cache. (probably double as in place FFT implementations do not lend well to vectorized FFTs) In terms of unoptimized operations, a FFT of 32 has O(N log2 N) = 32 * 5 = 160 complexity. Expanding out to the 2d operation where each row and column is operated on, giving 160 * 64 = 10240 = ~10k complexity. Depending on the compression level this can be reduced by a least a quarter. (no need to finalize the edge bins). SIMD operations further reduce the number of operations but require the data be transposed adding back in some complexity. AVX and SSE help a fair amount, but probably only 2-3x performance gains over non-vectorized implementations.

This is a very computational bottleneck where load stores represent a tiny fraction of the total operations. Best case #loads (64-96) / #operations (10k) = 0.0064 - 0.0096.
 
Last edited:

DAPUNISHER

Super Moderator and Elite Member
Moderator
Aug 22, 2001
25,143
10,076
146
While this has tuned into a good popcorn thread, it is way off the reservation.

The premise of the thread, right there in the title, is based on outdated pricing, AKA fake news. The current pricing on Zen 4 CPUs invalidates the original assertions that were made. The rest of the thread is retreading ground we have covered many times. Accompanied by all the usual logical fallacies the desperate resort to when losing the debate. The wheel keeps turning.
 

JustViewing

Member
Aug 17, 2022
52
111
66
I would say the extra e-cores do a lot to help intel's performance, much more than any cache/memory bandwidth.

At the core of any modern compression algorithm is a FFT. ( AV1 is a 32x32 2d FFT)

The memory used in this operation is 4k which easily fits in the L1 cache. (probably double as in place FFT implementations do not lend well to vectorized FFTs) In terms of unoptimized operations, a FFT of 32 has O(N log2 N) = 32 * 5 = 160 complexity. Expanding out to the 2d operation where each row and column is operated on, giving 160 * 64 = 10240 = ~10k complexity. Depending on the compression level this can be reduced by a least a quarter. (no need to finalize the edge bins). SIMD operations further reduce the number of operations but require the data be transposed adding back in some complexity. AVX and SSE help a fair amount, but probably only 2-3x performance gains over non-vectorized implementations.

This is a very computational bottleneck where load stores represent a tiny fraction of the total operations. Best case #loads (64-96) / #operations (10k) = 0.0064 - 0.0096.
For FFT if the dataset is arranged in favorable way, you can saturate the AVX execution units to the fullest. You can achieve this by interleaving multiple streams of FFT into single processing section. I have done this in the past.
 
  • Like
Reactions: Schmide

ASK THE COMMUNITY