Question Intel 12th to 13th generation performance comparison

GunsMadeAmericaFree · Dec 14, 2022

I thought this was an interesting read - benchmark comparisons between Intel 12th generation & 13th generation:

Article with details

That's an average performance increase of 47% from one generation to the next. I wonder if AMD will have a similar increase?

Kocicak · Dec 19, 2022

TheELF said:
Exactly, you can just disable it if it causes any slow downs for you, and everybody else can still enjoy up to 100% increase in throughput.

What are you talking about. I already posted and clearly explained that HT in 13900K is NOT USED AT ALL for loads of 24 or less threads.

I indicated performance of the threads above:

2130 - 1st thread on a P core
1210 - thread on a E core
700 - 2nd thread on a P core.

When a work is assigned to these threads, 2nd threads on P cores get the work as the last ones, because of their weak performance.

Where does your "up to 100% increase in throughput" claim come from, when in the absolutely worst case scenario of 100% multithreaded load disabling HT will cause just 13% drop in performance?

TheELF · Dec 19, 2022

Kocicak said:
Where does your "up to 100% increase in throughput claim" come from, when in the absolutely worst case scenario of 100% multithreaded load disabling HT will cause just 13% drop in performance?

Yeah on a CPU with a lot of cores that don't have any HTT at all it's 13% ,if a CPU doesn't have any HTT at all HTT is going to be a 0% increase, ohhh ahhh.
Still, every core on its own, if it runs two threads that are light and use little IPC each, can run both threads at full speed meaning it gets 100% more throughput.
Yes CB is one of the worst cases for HTT because it uses a lot of IPC, that doesn't mean that all threads are made the same.

The Basics of HyperThreading: What is it? - PCSTATS.com

www.pcstats.com

DrMrLordX · Dec 19, 2022

Thunder 57 said:
If you are going by the Simpsons, I believe that would be "Release the hounds!".

Well yes, I had to take some artistic license . It was that or make reference to "loosing of the bowels" and that's just icky.

So back on-topic (vaguely):

Since Raptor Lake (and presumably Alder Lake) puts preference on Gracemont cores over logical cores created by Hyperthreading, it makes me wonder: do these CPUs suffer performance loss due to cache locality in some workloads? If you put a thread on a logical core and it has to share data with another thread housed on the same physical core, they should be have a turnaround time governed by the speed of whichever cache must be accessed to retrieve said data (ideally L1 or L2, since those are local to the core). If you put a thread on a Gracemont core and it has to share data with a Goldmont/Raptormont core, you've got to go out to the ring. Fortunately that penalty is lowered on Raptor Lake.

igor_kavinski · Dec 19, 2022

Kocicak said:
Where does your "up to 100% increase in throughput" claim come from, when in the absolutely worst case scenario of 100% multithreaded load disabling HT will cause just 13% drop in performance?

It's not just the drop in performance. Without HT, there would be fewer cores to juggle the threads around. Until we get something like 1024 cores, HT will be required to keep context switching interruptions to minimum. HT cores contribute to an overall smoother multitasking experience.

LightningZ71 · Dec 19, 2022

igor_kavinski said:
It's not just the drop in performance. Without HT, there would be fewer cores to juggle the threads around. Until we get something like 1024 cores, HT will be required to keep context switching interruptions to minimum. HT cores contribute to an overall smoother multitasking experience.

On very low core count processors, I agree. On processors with more physical cores, it's not even noticeable. I've ran a ryzen 4700u as a daily driver for a while and didn't notice any issues with day to day performance even though it had no SMT support. I grant that 8 threads is a reasonable floor though. Having used a dual core, 4 thread computer while on the road in the past, it was definitely not a great experience.

Kocicak · Dec 19, 2022

igor_kavinski said:
It's not just the drop in performance. Without HT, there would be fewer cores to juggle the threads around. Until we get something like 1024 cores, HT will be required to keep context switching interruptions to minimum. HT cores contribute to an overall smoother multitasking experience.

I am sorry, but this has just no counterpart in reality.

In reality we are talking about 8+16 cores with 32 threads of three kinds: the second kind having 60% and the third 33% of the performance of the first one, and these three sorts of threads are being prioritized according to what they can do, with the third least powerful ones used only if there is no other option.

And you are fighting for these few last lamest threads, which hinder the performance of the first most useful kind. This is dumb.

igor_kavinski · Dec 19, 2022

Kocicak said:
And you are fighting for these few last lamest threads, which hinder the performance of the first most useful kind. This is dumb.

Not really fighting coz I know HT isn't going anywhere 😀

And disabling HT does increase the ST performance of Core i5-12400 in Geekbench 5, from my own testing.

The only thing that's debatable is whether the silicon space freed up by removing HT could be used to further increase the ST performance of a core.

Kocicak · Dec 19, 2022

HT in PCs is on a stretcher in an ambulance gasping for its last breath.

It is not only about space alone but also about other measures and connections (as safety measures), as somebody already mentioned.

Without HT a core can be more easily improved, than with it. When improving a simpler thing, you have less chance of your improvements interfering with the extra stuff present on a complicated core with HT.

Keep in mind that a P core has total computing power of 2130 + 700 in its two threads with well used resources by the two threads at 5500 MHz. If you recalculate E core to the same frequency, it does 1480 with one thread will less optimally used resources. So E core is HALF of the performance of the P core, while being FOUR TIMES smaller.

I believe that a significant part of this P core area inefficiency can be contributed to HT circuitry.

BorisTheBlade82 · Dec 19, 2022

LightningZ71 said:
I made this point before: though many applications are multithreaded, they still commonly have precious few, often just one or two, that are "performance critical" in that the performance of those specific threads is the main limiting factor with respect to user experience or application productivity. The other threads tend to be housekeeping, background processes and work-ahead prestaging of information which, while important, is not compute heavy and isn't performance determinant.

If we can make those performance critical threads more performant while still maintaining sufficient MT throughput via E cores, we improve more cases for everyone.

Quoted for truth! I am really wondering why this is so hard to grasp for some people around here.

igor_kavinski · Dec 19, 2022

Please correct me if I'm wrong but wasn't HT introduced to mitigate pipeline stalls due to branchy code? If the pipeline is flushed, the CPU has to start all over again. With HT, at least the 2nd HT thread in flight can still get SOMETHING accomplished, rather than all CPU cycles up to the pipeline stall wasted.

Markfw · Dec 19, 2022

Kocicak said:
HT in PCs is on a stretcher in an ambulance gasping for its last breath.

It is not only about space alone but also about other measures and connections (as safety measures), as somebody already mentioned.

Without HT a core can be more easily improved, than with it. When improving a simpler thing, you have less chance of your improvements interfering with the extra stuff present on a complicated core with HT.

Keep in mind that a P core has total computing power of 2130 + 700 in its two threads with well used resources by the two threads at 5500 MHz. If you recalculate E core to the same frequency, it does 1480 with one thread will less optimally used resources. So E core is HALF of the performance of the P core, while being FOUR TIMES smaller.

I believe that a significant part of this P core area inefficiency can be contributed to HT circuitry.

You are an amateur.I have seen many professional studies done on SMT. MOST of the time it provides 30-40% increase in performance. On a very small percentage of tasks it did hurt the performance.

Kocicak · Dec 19, 2022

Markfw said:
You are an amateur.

That is correct, I am also discussing PCs for amateurs. Professionals running intensive multithreaded applications 24/7 should have workstations with server CPUs in them or servers.

AMD sells very nice server CPUs rebranded as Threadripper for such workstations.

I am not discussing such CPUs.

BorisTheBlade82 · Dec 19, 2022

igor_kavinski said:
Please correct me if I'm wrong but wasn't HT introduced to mitigate pipeline stalls due to branchy code? If the pipeline is flushed, the CPU has to start all over again. With HT, at least the 2nd HT thread in flight can still get SOMETHING accomplished, rather than all CPU cycles up to the pipeline stall wasted.

HT was introduced to get more out of the resources of a wide and/or deep core. ATM I do not generally question the benefit of HT. The transistor budget used was generally worth it - but that is something that might change over time for reasons @LightningZ71 explained.
What I want to emphasize is that in the general distribution of workloads there are two main clusters: The ones that scale with only one or very few threads. And the ones that scale more or less indefinitely. Everything in between is rather minor. So for the majority of workloads it might be well worth it to have less, but more powerful big cores and more small cores in the same transistor budget.
For me it is highly likely that Intel R&D arrived at 6+24 for RPL being optimal, but Marketing vetoed this by saying that they must not go below 8 cores for their Top dog for Marketing reasons.

Markfw · Dec 19, 2022

Kocicak said:
That is correct, I am also discussing PCs for amateurs. Professionals running intensive multithreaded applications 24/7 should have workstations with server CPUs in them or servers.

AMD sells very nice server CPUs rebranded as Threadripper for such workstations.

I am not discussing such CPUs.

No, these results were for average PC's, not threadrippers or servers.

igor_kavinski · Dec 19, 2022

BorisTheBlade82 said:
HT was introduced to get more out of the resources of a wide and/or deep core.

Raptor Cove is 6 wide while Gracemont is 5 wide. I have a feeling that Intel will enable HT on Crestmont in Meteor Lake to get the most processing power out of the limited space on that CPU die. Otherwise, it could become really hard for them to compete with Zen 5.

LightningZ71 · Dec 19, 2022

igor_kavinski said:
Raptor Cove is 6 wide while Gracemont is 5 wide. I have a feeling that Intel will enable HT on Crestmont in Meteor Lake to get the most processing power out of the limited space on that CPU die. Otherwise, it could become really hard for them to compete with Zen 5.

They might very well do better to enable HT on the e cores and remodel the P cores to focus on maximum ST performance. Enabling HT on the E cores is going to put more pressure on the shared L2, which may not allow enough of a boost in performance. However, even if they manage 20% more MT performance from the E cores by adding HT, that should dwarf what they were getting from it on the P cores and further make the case for streamlining them for ST throughput.

igor_kavinski · Dec 19, 2022

LightningZ71 said:
They might very well do better to enable HT on the e cores and remodel the P cores to focus on maximum ST performance.

Now that I can get behind!

Kocicak · Dec 19, 2022

LightningZ71 said:
They might very well do better to enable HT on the e cores

AAAAARGH! You know that E cores are as streamlined as possible, right? How much would they swell if you put HT stuff on them? And what could you practically accomplish with a gazillion of extremely weak second threads on E cores? The threads exist to actually DO SOMETHING USEFUL.

DrMrLordX · Dec 19, 2022

Kocicak said:
How much would they swell if you put HT stuff on them?

In terms of raw silicon area? Not by much.

igor_kavinski · Dec 19, 2022

Kocicak said:
AAAAARGH!

Now we know your worst nightmare 😀

LightningZ71 · Dec 19, 2022

Kocicak said:
AAAAARGH! You know that E cores are as streamlined as possible, right? How much would they swell if you put HT stuff on them? And what could you practically accomplish with a gazillion of extremely weak second threads on E cores? The threads exist to actually DO SOMETHING USEFUL.

Well, they aren't going to be on Intel7+, but some denser node as this wouldn't be a retrofit of existing gracemont but included on a future product. I dare say that they could invest transistors in making them HT enabled as a focus and grab whatever low hanging fruit they can get in there and naintain the sane area ratio of roughly .25 P cores each. The point of the e cores is MT throughput with optimized area usage. HT could make a difference there.

Carfax83 · Dec 19, 2022

Another reason why SMT is useful with contemporary x86-64 CPUs is because those cores are clocked very high, close to 6ghz or right at 6ghz for the upcoming 13900KS.

High clock speeds means increased branch misprediction and pipeline stall penalties as well as increased memory latency, both of which SMT helps to mitigate. Arm CPUs tend to be clocked much lower than comparable x86-64 designs and probably wouldn't benefit from SMT as much.

Some people on this forum act as though Intel and AMD engineers are incompetent and don't know what they are doing. There's a reason why SMT has been used for such a long time, and the benefits typically outweigh any of the drawbacks. I don't even bother turning it off for just a few percentage points increase in whatever application or game.

BorisTheBlade82 · Dec 20, 2022

Kocicak said:
AAAAARGH! You know that E cores are as streamlined as possible, right? How much would they swell if you put HT stuff on them? And what could you practically accomplish with a gazillion of extremely weak second threads on E cores? The threads exist to actually DO SOMETHING USEFUL.

SMT2 was always said to add around 10% to the logic transistor budget IIRC.
As for SMT in small cores: Ironically C'n'C just posted an article about Knights Landing - an Atom based CPU with SMT4 and brutal scaling.

Knight’s Landing: Atom with AVX-512

Intel is known for their high performance cores, which combine large out of order execution engines with high clock speeds to maximize single threaded performance.

chipsandcheese.com

JustViewing · Dec 20, 2022

Sometimes HT gives near 100% boost. This happens in enterprise applications written ether in DotNet or Java. I have personally observed these many times. The "Review Benchmark" applications are optimized for performance, therefore it doesn't scale with well HT. For the average enterprise application, code maintainability is the main concern. They usually don't care about "CPU cache" hit rate. There are built in functionality in DotNet for example to easily introduce multi threading to sections of code. So in essence, poorly-optimized/memory-dependent applications can greatly benefit from HT.

On the other extreme end, when I write applications in ASM there were no/negative scaling with HT. This is because when writing ASM you always have cache limit in back of your mind.

DrMrLordX · Dec 20, 2022

JustViewing said:
Sometimes HT gives near 100% boost. This happens in enterprise applications written ether in DotNet or Java. I have personally observed these many times. The "Review Benchmark" applications are optimized for performance, therefore it doesn't scale with well HT. For the average enterprise application, code maintainability is the main concern. They usually don't care about "CPU cache" hit rate. There are built in functionality in DotNet for example to easily introduce multi threading to sections of code. So in essence, poorly-optimized/memory-dependent applications can greatly benefit from HT.

On the other extreme end, when I write applications in ASM there were no/negative scaling with HT. This is because when writing ASM you always have cache limit in back of your mind.

Dr. Curtress' 3DPM v1 showed massive improvement from HT, mostly due to cache thrashing.

Question Intel 12th to 13th generation performance comparison

Golden Member

Golden Member

Diamond Member

Lifer

Lifer

Platinum Member

Golden Member

Lifer

Golden Member

Senior member

Lifer

Moderator Emeritus, Elite Member

Golden Member

Senior member

Moderator Emeritus, Elite Member

Lifer

Platinum Member

Lifer

Golden Member

Lifer

Lifer

Platinum Member

Diamond Member

Senior member

Senior member

Lifer