Question Intel 12th to 13th generation performance comparison

GunsMadeAmericaFree · Dec 14, 2022

I thought this was an interesting read - benchmark comparisons between Intel 12th generation & 13th generation:

Article with details

That's an average performance increase of 47% from one generation to the next. I wonder if AMD will have a similar increase?

LightningZ71 · Dec 20, 2022

Carfax83 said:
Another reason why SMT is useful with contemporary x86-64 CPUs is because those cores are clocked very high, close to 6ghz or right at 6ghz for the upcoming 13900KS.

High clock speeds means increased branch misprediction and pipeline stall penalties as well as increased memory latency, both of which SMT helps to mitigate. Arm CPUs tend to be clocked much lower than comparable x86-64 designs and probably wouldn't benefit from SMT as much.

Some people on this forum act as though Intel and AMD engineers are incompetent and don't know what they are doing. There's a reason why SMT has been used for such a long time, and the benefits typically outweigh any of the drawbacks. I don't even bother turning it off for just a few percentage points increase in whatever application or game.

Branch Mis-predict penalties with respect to a single thread are the same on a given processor no matter if it has SMT enabled or not, assuming that each thread has it's own predictor or that any shared predictor isn't being negatively affected by an independent thread on the second instruction stream. What SMT enables is more efficient usage of overall processor resources by having a second instruction stream from another thread using unused resources in the core that the first thread isn't using. It also can enable the second thread to use pipeline stages that are unused as the result of a mis-predict bubble.

In other words, assuming the same processor design, having SMT enabled doesn't enable the first thread to be more resistant to throughput limitations imposed by branch-mispredicts. In fact, having a processor that's optimized for SMT often means that the two threads have independent branch predictors, duplicating a lot of logic, and reducing the amount of resources that could be allocated to either one if it lived in isolation on that processor. The end result is a predictor that isn't quite as good as it could be if it didn't have to share space with a second one, or, in a shared environment, didn't have to allocate resources to enable it's own sharing.

Memory latency is still going to be the same for any single thread (give or take contention between threads in an SMT design) no matter if the core is SMT enabled or not.

I have plenty of faith in Intel and AMD engineers. It's the marketing and C-suite executives that we all know fill well will sometimes choose advertizable stats that are better than the competition at the expense of technical elegance.

LightningZ71 · Dec 20, 2022

JustViewing said:
Sometimes HT gives near 100% boost. This happens in enterprise applications written ether in DotNet or Java. I have personally observed these many times. The "Review Benchmark" applications are optimized for performance, therefore it doesn't scale with well HT. For the average enterprise application, code maintainability is the main concern. They usually don't care about "CPU cache" hit rate. There are built in functionality in DotNet for example to easily introduce multi threading to sections of code. So in essence, poorly-optimized/memory-dependent applications can greatly benefit from HT.

On the other extreme end, when I write applications in ASM there were no/negative scaling with HT. This is because when writing ASM you always have cache limit in back of your mind.

So, to be clear, you are referencing hand-written assembly for targeted enterprise level applications and not general usage applications that are typically found in most desktop environments? I readily conceed that HT on P cores for servers has it's place in the server racks of the world, but so would servers with 4 times the E-cores in many environments. Given the choice of having twice the threads with 70+% performance of the fastest thread of an alternative that has half 100% cores and half cores that are typically 30% but can sometimes get near 100% on very specific cases, it would seem to me that 300% is better than somewhere between 130%-200%...

Kocicak · Dec 20, 2022

Carfax83 said:
... Some people on this forum act as though Intel and AMD engineers are incompetent and don't know what they are doing. There's a reason why SMT has been used for such a long time, and the benefits typically outweigh any of the drawbacks. ...

SMT made a perfect sense in the old times when PCs were limited to 1-4 cores.

SMT still makes a perfect sense for professional heavily multithreaded applications, but there the primary benefit comes from the number of physical cores, SMT just helps to squeeze even more performance out of those many cores. Ideal CPU for such workloads now is AMD Threadripper and server CPUs.

For normal PCs, where consumers, who need them, can get 16 (full cores in AMD CPUs) or 24 (full and compact cores in Intel CPUs), SMT is not needed anymore.

Theoretical arguments are kind of useless, when you yourself can see, how powerful and how prioritized the second threads really are. Not much.

I stated three or four times already, that the second threads in 13900K are the least powerful, the last ones to be used and they add just the last 13 percent of MT performance.

If somebody really needs that much MT performance, he should get a Threadripper or another server CPU. He is severely limited by a consumer CPU and platform.

Normal people need a few powerful threads and also some more for occasional loads, that could use them. (BTW I like the P+E cores approach a lot).

Proffesionals should get a proper workstation with Threadripper, other server CPU with a lot of cores or a server with such CPU/s.

Anybody between these two is an anomaly and should not exist, and if they exist and complain about lack of MT in CPUs for normal people, they should not be listened to. They should be silent, work hard and save money for CPUs they really need.

Thunder 57 · Dec 20, 2022

We shouldn't be getting rid of SMT, we should be going to SMT4 as one here once proclaimed loudly. Also @Kocicak does your bold key ever get stuck?

Markfw · Dec 20, 2022

Kocicak said:
SMT made a perfect sense in the old times when PCs were limited to 1-4 cores.

SMT still makes a perfect sense for professional heavily multithreaded applications, but there the primary benefit comes from the number of physical cores, SMT just helps to squeeze even more performance out of those many cores. Ideal CPU for such workloads now is AMD Threadripper and server CPUs.

For normal PCs, where consumers, who need them, can get 16 (full cores in AMD CPUs) or 24 (full and compact cores in Intel CPUs), SMT is not needed anymore.

Theoretical arguments are kind of useless, when you yourself can see, how powerful and how prioritized the second threads really are. Not much.

I stated three or four times already, that the second threads in 13900K are the least powerful, the last ones to be used and they add just the last 13 percent of MT performance.

If somebody really needs that much MT performance, he should get a Threadripper or another server CPU. He is severely limited by a consumer CPU and platform.

Normal people need a few powerful threads and also some more for occasional loads, that could use them. (BTW I like the P+E cores approach a lot).

Proffesionals should get a proper workstation with Threadripper, other server CPU with a lot of cores or a server with such CPU/s.

Anybody between these two is an anomaly and should not exist, and if they exist and complain about lack of MT in CPUs for normal people, they should not be listened to. They should be silent, work hard and save money for CPUs they really need.

A lot of NORMAL people do some encoding and such. They get a big benefit with SMT. If I thought about it more I am sure I could come up with a list of things that NORMAL people do that benefit from SMT. You have made your opinion clear, and its just NOT NORMAL, so why don't you drop the whole thing as you will not change the world just by ranting.

JustViewing · Dec 20, 2022

LightningZ71 said:
So, to be clear, you are referencing hand-written assembly for targeted enterprise level applications and not general usage applications that are typically found in most desktop environments?

No, the assembly part is my own hobby applications I used as comparison with High Level applications. Nobody sane would write enterprise applications in Assembly

.
In sever you don't want Hybrid approach. It should be all P Cores or all E Cores. When you are doing VM/Docker type of deployment you don't want hybrid approach with unpredictable performance.
Large P core is Intel's own problem. Relatively (I think) AMD cores are much smaller than Intel's P-Core.

JustViewing · Dec 20, 2022

Kocicak said:
SMT made a perfect sense in the old times when PCs were limited to 1-4 cores.

SMT still makes a perfect sense for professional heavily multithreaded applications, but there the primary benefit comes from the number of physical cores, SMT just helps to squeeze even more performance out of those many cores. Ideal CPU for such workloads now is AMD Threadripper and server CPUs.

For normal PCs, where consumers, who need them, can get 16 (full cores in AMD CPUs) or 24 (full and compact cores in Intel CPUs), SMT is not needed anymore.

Theoretical arguments are kind of useless, when you yourself can see, how powerful and how prioritized the second threads really are. Not much.

I stated three or four times already, that the second threads in 13900K are the least powerful, the last ones to be used and they add just the last 13 percent of MT performance.

If somebody really needs that much MT performance, he should get a Threadripper or another server CPU. He is severely limited by a consumer CPU and platform.

Normal people need a few powerful threads and also some more for occasional loads, that could use them. (BTW I like the P+E cores approach a lot).

Proffesionals should get a proper workstation with Threadripper, other server CPU with a lot of cores or a server with such CPU/s.

Anybody between these two is an anomaly and should not exist, and if they exist and complain about lack of MT in CPUs for normal people, they should not be listened to. They should be silent, work hard and save money for CPUs they really need.

So what is the harm in having HT if it increases the die size by tiny bit? HT is more important with P cores getting wider. I agree that E-cores can function as multi core accelerator. But what if you are running 2 or more performance critical applications at the same time? It will certainly perform very badly against all p core system.

Abwx · Dec 20, 2022

Kocicak said:
What are you talking about. I already posted and clearly explained that HT in 13900K is NOT USED AT ALL for loads of 24 or less threads.

I indicated performance of the threads above:

2130 - 1st thread on a P core
1210 - thread on a E core
700 - 2nd thread on a P core.

When a work is assigned to these threads, 2nd threads on P cores get the work as the last ones, because of their weak performance.

Where does your "up to 100% increase in throughput" claim come from, when in the absolutely worst case scenario of 100% multithreaded load disabling HT will cause just 13% drop in performance?

That doesnt work like this, ressources are fully shared when there s two threads :

2130 - 1st thread on a P core

Mt score being 2830 :

1415 - 1st thread on a P core.
1415 - 2nd thread on a P core.

Kocicak · Dec 20, 2022

Markfw said:
A lot of NORMAL people do some encoding and such. They get a big benefit with SMT. If I thought about it more I am sure I could come up with a list of things that NORMAL people do that benefit from SMT.

That benefit is not that big. It gets even smaller, if you run a power limited CPU (which is almost necessary now), because higher load with SMT will make the CPU run slower.

I would argue, that people running heavy MT loads occasionally and not for living do not care, if a task lasts 60 minutes or 75.

And if they do it for living using significant percentage of the computer time, they need a lot of cores, much more than 16 or 24, and multithreading as well.

BorisTheBlade82 · Dec 20, 2022

Abwx said:
That doesnt work like this, ressources are fully shared when there s two threads :

2130 - 1st thread on a P core

Mt score being 2830 :

1415 - 1st thread on a P core.
1415 - 2nd thread on a P core.

Hell, no. And you can see it with your own eyes. Just run Cinebench on a SMT CPU and look how fast the tiles get rendered. Half of them is rendered much faster because they are the prime thread per core. The other half is the second thread and is much slower.
If what you say was right, then SMT would be terrible for Games or any other lightly threaded workload.

scineram · Dec 20, 2022

Abwx said:
That doesnt work like this, ressources are fully shared when there s two threads :

2130 - 1st thread on a P core

Mt score being 2830 :

1415 - 1st thread on a P core.
1415 - 2nd thread on a P core.

This is an important point that many people have not yet coped with. Hyperthreads drastically cut per thread performance, hopefully to more than half of what a single thread can achieve.

Markfw · Dec 20, 2022

SMT ON The conclusion here at average 22% benefit for a 5950x 16 core .

Investigating Performance of Multi-Threading on Zen 3 and AMD Ryzen 5000

www.anandtech.com

LightningZ71 · Dec 20, 2022

SMT's advantage over not having it is highly situationally dependent and relates a LOT to L2 cache behavior, Branch Predictor design and capacity, how wide a core is, and how different the processor resources needed by one thread are from the other, or, how well the two threads can share the same resources. It's possible to have HT be a complete wash and bring almost no boost. It's also possible for two highly localized, small footprint threads that don't have any resource contention to realize an almost 100% throughput improvement with it (just like the above poster was seeing with their applications). However, the average is in the 20-30% range.

Carfax83 · Dec 20, 2022

LightningZ71 said:
Memory latency is still going to be the same for any single thread (give or take contention between threads in an SMT design) no matter if the core is SMT enabled or not.

My point was, the faster a CPU becomes, the more it has to wait on memory, which increases the potential for a thread stall. SMT helps to mitigate that somewhat because if one thread stalls, the other can continue to process. GPUs use a similar tactic with the sheer amount of threads they process at once. Latency doesn't really matter if you have thousands of threads churning away at any given time. If one doesn't complete a task for any particular reason, there are many others that will.

I have plenty of faith in Intel and AMD engineers. It's the marketing and C-suite executives that we all know fill well will sometimes choose advertizable stats that are better than the competition at the expense of technical elegance.

But Intel have had SMT for a long time now, for two decades in fact. Starting with the PIV, then they took a break with the original Core duo and quad series, and then resumed with Nehalem and haven't let up since.

Why stop now? SMT has better returns now than it did on those older architectures because these current CPUs are much wider and with higher clock speeds to boot.

Carfax83 · Dec 20, 2022

Kocicak said:
Anybody between these two is an anomaly and should not exist, and if they exist and complain about lack of MT in CPUs for normal people, they should not be listened to. They should be silent, work hard and save money for CPUs they really need.

So you're telling me I should buy a Threadripper or Xeon because I occasionally transcode my Blu-ray movie collection to x265 or AV1?

Are you serious?

Markfw · Dec 20, 2022

Carfax83 said:
So you're telling me I should buy a Threadripper or Xeon because I occasionally transcode my Blu-ray movie collection to x265 or AV1?

Are you serious?

The post you quoted is almost trolling.

LightningZ71 · Dec 20, 2022

Why stop now? Because their own creation (the P+E core hybrid x86 processor) is proving to be able to provide superior MT throughput with an e-core quad than the same die space allocated to a P core. Because performance of many applications is highly governed by the performance of just one or two threads, and providing superior single threaded performance for those few threads is important both for application performance and comparative marketing benchmarks. Because they want to make the best possible all-around processor for desktop usage?

And, before we start with "well, they use the same cores for server that they use for desktop..." no, they don't. They have been using enhanced cores for servers for several generations now that provide superior AVX-512 throughput as well as having used a mesh architecture for most of a decade now whereas they use a modified ring on desktop. The processors are VERY different.

As for memory latency, what also helps with process stalls on memory access? Larger L1s and more agressive prefetch algorithms, both things that can be done on a core with the transistors left over from removing HT. The difference here is that removing HT to improve ST performance actually improves ST performance, as opposed to just making HT work better. If you're worried about other threads, there are a ton of E cores out there to handle them. Stalling the P core also allows it to reduce power consumption and waste heat buildup while it waits on the memory access to complete, enabling it to maintain higher clocks once it finally does get the needed data.

Kocicak · Dec 20, 2022

Carfax83 said:
So you're telling me I should buy a Threadripper or Xeon because I occasionally transcode my Blu-ray movie collection to x265 or AV1?

Are you serious?

Yes, if you read carefully, what I wrote, you are a normal person, who needs occasionally to run some heavier multithreaded load.

I believe that both 7950X or 13900K can satisfy your needs even with hyperthreading off.

If you transcoded movies for living, you would surely be better off buying a Threadripper or something similar with hyperthreading enabled.

Markfw · Dec 20, 2022

Kocicak said:
Yes, if you read carefully, what I wrote, you are a normal person, who needs occasionally to run some heavier multithreaded load.

I believe that both 7950X or 13900K can satisfy your needs even with hyperthreading off.

If you transcoded movies for living, you would surely be better off buying a Threadripper or something similar with hyperthreading enabled.

If you have hyperthreading with either of those 2, and you get a ~25% performance bonus with it on, why would you turn it off ?? You are making no sense at all here.

Kocicak · Dec 20, 2022

I am not telling anybody to turn the HT off. The discussion is about possible performance improvement enabled by simplifying the core by removing HT circuitry from it.

I actually compared the 13900K with HT on and off, and with hypothetical 13900K, which had its P core improved by just 3% (that it very careful guess) The graph is on the top of the second page.

If you add all regressions and improvements over the whole load intensity spectrum, there is overall net gain even with the low 3% improvement.

dullard · Dec 20, 2022

george198011 said:
More p-cores rather than e-waste cores is way way way better!!!

That is true only for a very small set of software.

Markfw · Dec 20, 2022

Kocicak said:
I am not telling anybody to turn the HT off. The discussion is about possible performance improvement enabled by simplifying the core by removing HT circuitry from it.

I actually compared the 13900K with HT on and off, and with hypothetical 13900K, which had its P core improved by just 3% (that it very careful guess) The graph is on the top of the second page.

If you add all regressions and improvements over the whole load intensity spectrum, there is overall net gain even with the low 3% improvement.

Is 20-35% gains by using HT, with very little space required for circuitry. There is NO reason not to have it.

Abwx · Dec 20, 2022

BorisTheBlade82 said:
Hell, no. And you can see it with your own eyes. Just run Cinebench on a SMT CPU and look how fast the tiles get rendered. Half of them is rendered much faster because they are the prime thread per core. The other half is the second thread and is much slower.
If what you say was right, then SMT would be terrible for Games or any other lightly threaded workload.

That s 50/50, in Cinebench as well, it s just that the tiles require different amount of computation, there s some that need few computations while other require way more calculus, you can see it when doing the test in ST.

In games drivers call for one thread/core untill it s exhausted, then SMT is used, but set apart a corner case here and there most games are content with 6 main threads.

Exist50 · Dec 20, 2022

SMT makes sense in a world where you have only one core that needs to fit every task. In that situation, sacrificing a couple percent ST performance for ~30% more throughput performance is easily worth it, hence why SMT still exists. But that's not the world we'll have in the future, where both big and small cores will be an option across the product stack.

I'm not going to belabor the point, but as others have already said, most workloads have a couple of ST critical threads at most, and the few that do scale beyond that often can use many cores. So for most workloads, the ideal config would be a number of ST-optimized big cores to capture those critical threads, and small cores for the rest. SMT only makes sense if you need to accommodate a wide variety of workloads with different threading demands, but we're getting to the point where raw core counts can easily cover a superset.

Gaming is actually interesting, in that it's one of the only consumer workloads demanding a moderate number of ST-critical threads. But even there, 8 big cores (SMT or not) is empirically capable of handling it today. And with chiplets, this is even less of an issue. Imagine Intel had one 8+0 (no SMT) chiplet and another 0+32 one. 2x 8+0 would easily handle gaming, 8+0 + 0+32 would be great for productivity, and 2x 0+32 would be really interesting for certain embedded use cases. And for workstations, could just add more of the E-core chiplets.

Also, the benefit isn't just the extra ~5% or whatever transistors; it's also the engineer time. How many 10s of thousands or even hundreds of thousands of hours have been devoted to implementing, maintaining, and securing SMT? What could we get if those engineers were devoted to something else? Be that power, area, ST performance, features, whatever.

Exist50 · Dec 20, 2022

LightningZ71 said:
They might very well do better to enable HT on the e cores and remodel the P cores to focus on maximum ST performance. Enabling HT on the E cores is going to put more pressure on the shared L2, which may not allow enough of a boost in performance. However, even if they manage 20% more MT performance from the E cores by adding HT, that should dwarf what they were getting from it on the P cores and further make the case for streamlining them for ST throughput.

I think the big question there is whether the per thread throughput would be useful. For server in particular, even high throughput use cases, it's useful to think of the requirement as maximum throughput at some minimum per thread performance. If you want throughput at any cost, you'd combine a bunch of tiny cores together with really wide SMT.....and you've just reinvented Xeon Phi.

Question Intel 12th to 13th generation performance comparison

Golden Member

Platinum Member

Platinum Member

Golden Member

Diamond Member

Moderator Emeritus, Elite Member

Senior member

Senior member

Lifer

Golden Member

Senior member

Senior member

Moderator Emeritus, Elite Member

Platinum Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Platinum Member

Golden Member

Moderator Emeritus, Elite Member

Golden Member

Elite Member

Moderator Emeritus, Elite Member

Lifer

Platinum Member

Platinum Member