Question Intel 12th to 13th generation performance comparison

GunsMadeAmericaFree · Dec 14, 2022

I thought this was an interesting read - benchmark comparisons between Intel 12th generation & 13th generation:

Article with details

That's an average performance increase of 47% from one generation to the next. I wonder if AMD will have a similar increase?

Kocicak · Dec 17, 2022

I am not sure if you know how the load is distributed to the cores, but second threads of P cores get work as last ones. I actually recently tested running 13900K with and without HT and with increasing load, the results were equal to cca 35 K Cinebench points. There the HT OFF chip stopped, but the HT ON chip could add further 5K points.

Turning the HT off affects performance ONLY if you feed it 25 and more threads. 24 thread and lower load sees no negative effect of turning the HT off.

That is 12,5% performance penalty, which is realised ONLY if you run extremely high 32 thread load. If removing of HT could bring even 2% improvement in 1-8 thread load, I am all for it. I would personally never use that extremely high load anyway, so I would not see any negative performance impact.

Kocicak · Dec 17, 2022

Based on my previous measured results, I made a comparison of 13900K with HT on and off and hypothetical 13900K, which had its P core HT circuitry removed, which allowed 3% performance improvement, dependent on load intensity.

Only somebody running full multithreaded load all the time would prefer to have HT on the chip (and those people should probably get some proper workstation with server CPU in it).

Everybody else would welcome 3% P core improvement enabled by HT removal.

LightningZ71 · Dec 17, 2022

Some people are failing to grasp other elements of the situation. If Intel had decided to sacrifice HT on the P cores in a quest for higher ST performance, the entire stack would look different! Intel already used four different masks for Alder Lake (2+8, 6+8, 8+8, and 6+0). Raptor Lake adds an additional mask. Without HT, the dies would be split differently, there would be e cores at every I level, and I3 would be 2+4 and 2+8, i5 would be 4+8, with K being 4+16, i7 would be 6+8 with K being 6+16, and i9 would be 6 +16 with K being 6+24. The Intel Processor (ex pentium and celeron) could be die recovery.

With an additional 3-5% more ST performance and more hardware cores to throw at MT tasks, every benchmark, save for ones that specifically hammer exclusively 7 and 8 thread cases would see a notable uplift. Also, with only 2 tiers of thread types to manage, even the thread director would work better.

igor_kavinski · Dec 17, 2022

LightningZ71 said:
Also, with only 2 tiers of thread types to manage, even the thread director would work better.

Oh no. Kocicak's virus has infected you too!

Either you are both crazy or we are witnessing a true prediction of HT's future demise.

VirtualLarry · Dec 17, 2022

There's no performance reason to get rid of HT. Not in this stage of it's evolution. Don't forget, for very fast cores, it helps to mask memory access latency, when they
are forced to go out to DRAM.

TheELF · Dec 18, 2022

Kocicak said:
That is 12,5% performance penalty,

If you use all of your full cores and all of your small cores together...which is somehow representative to single core speeds for you?!

On modern CPUs you get ONLY ONE or maybe two cores that run at the highest speed, Intel thermal velocity boost, and it only runs on the best two cores.
So removing HT would cut possible throughput by 100% or 50% respectively if you get the highest boost on one or two cores.

Single-Core TVB Takes the faster of the two favored CPU cores to a speed superior to Turbo Boost Max 3.0.

How Intel Technologies Boost Your CPU's Performance - Intel

Innovations like Adaptive Boost Technology make 11th Gen CPUs faster. Learn how Turbo Boost, Thermal Velocity Boost, and ABT work together.

www.intel.com

s6-a04-05-abt-boost-tech-bar-chart-original-rwd.jpg.rendition.intel.web.1920.1080.jpg

Kocicak · Dec 18, 2022

The example I posted above does not use any super high single core frequencies, all the P and E cores ran at the indicated frequencies all the time.

The hypothetical improvements enabled by removing HT circuitry would benefit performance of the P cores at any frequency.

BTW if you add percentage improvements and regressions in the above scenario, you get a net gain in performance.

If you applied some weights preferring lower intensity load, the net gain would be even larger.

igor_kavinski · Dec 18, 2022

Kocicak said:
The hypothetical improvements enabled by removing HT circuitry would benefit performance of the P cores at any frequency.

Sorry but don't think that's happening ever (unless Intel creates a die with a sea of 100 cores or something). Let's suppose Intel has 32 P-cores in their flagship consumer part in the future. Marketing will complain if they remove HT and the illusion of double the amount of cores. 64 looks a lot more powerful than 32 on paper. It's like silicone implants. First impressions matter.

VirtualLarry · Dec 18, 2022

Kocicak said:
The hypothetical improvements enabled by removing HT circuitry would benefit performance of the P cores at any frequency.

On the whole, this is NOT true.

It may have a slight benefit to ST apps, but would grossly waste throughput in MT scenarios.

Since software is increasingly becoming MT, both within a single app, as well as collectively as an OS, as other bottlenecks (storage with PCI-E 5.0, networking with 2.5/10GbE) get obliterated.

The writing is on the wall. Removing HT at this point would be a net loss to performance.

Kocicak · Dec 18, 2022

It may be useful to remind that the high single core performance of Alder lake CPUs put Intel back on track in PCs, nothing else.

Achieving highest possible single core performance is a first priority for Intel. They do it even at the cost of possibly running the CPUs at frequencies outside of what they are really capable of.

igor_kavinski · Dec 18, 2022

https://www.researchgate.net/publication/332218864_Bootstrapping_Using_SMT_Hardware_to_Improve_Single-Thread_Performance

Imagine SMT4 with three look ahead threads and one main thread. THAT, if developed meticulously to minimize drawbacks, could lead to substantial ST gains.

TheELF · Dec 18, 2022

Kocicak said:
It may be useful to remind that the high single core performance of Alder lake CPUs put Intel back on track in PCs, nothing else.

Achieving highest possible single core performance is a first priority for Intel. They do it even at the cost of possibly running the CPUs at frequencies outside of what they are really capable of.

No, the ST clock intel states is what each and every single CPU can get when running whatever you want even the heaviest threads, and it will keep running at that speed as long as you can keep up the power and cooling it needs.
You can (Single core) overclock above that so it's very inside of what they are capable of.
It's ryzen where they state a max clock that you only see in very small bursts and only if you run very light threads.

LightningZ71 · Dec 18, 2022

How is my point not getting through?

I am not proposing that they just eliminate HT in isolation! I am proposing that a processor could see a notable performance improvement by doing three things:
1) eliminate HT and use that circuit and transistor cost for additional ST throughput.
2) reduce the number of large cores at the top end of the stack
3) use the die real-estate from the reduction of the large cores to add more small cores.

As an example: instead of Raptor Lake being 8 P cores plus 16 E cores, make it 6 P cores with 24 e cores and ALSO remove HT from the P cores and instead use thatrecovered transistor budget for more ST performance. Now, it's 6p+ and 24e cores. You do loose two threads, going from 32 down to 30, but, of those 30 threads, all are hardware instead of only having 24 hardware threads and 8 hyperthreads. That is MORE total throughput as the e cores are much faster than P core second threads in addition to faster ST performance from the P cores due to more transistors being spent on it in addition to their NOT being L2 contention on the P cores between threads!

It is Win-win everywhere but marketing!

TheELF · Dec 18, 2022

LightningZ71 said:
1) eliminate HT and use that circuit and transistor cost for additional ST throughput.

Yes and also remove body fat from the belly and use that to make somebody taller....

Adding circuitry and transistors to a core does not make it faster, you have to come up with ways to do instructions in fewer clock cycles which is something the industry has been refining for 40+ years now and there isn't much left on that front, or you have to increase the amount of clock cycles per time to make things faster.

Having a higher CB23 on single core is not being faster, it is having more throughput which is the same as having more throughput on additional cores.
If you need single threaded speed you do need it because of things that only have one single thread to run so you can't leverage extra cores or extra IPC on the same core, it only uses as much as it uses.

Thunder 57 · Dec 18, 2022

igor_kavinski said:
Sorry but don't think that's happening ever (unless Intel creates a die with a sea of 100 cores or something). Let's suppose Intel has 32 P-cores in their flagship consumer part in the future. Marketing will complain if they remove HT and the illusion of double the amount of cores. 64 looks a lot more powerful than 32 on paper. It's like silicone implants. First impressions matter.

Liked for correct usage of the word silicone. Now if only people could correctly use the word "lose" and "losing". So many times I see "loosing".

igor_kavinski · Dec 18, 2022

LightningZ71 said:
ALSO remove HT from the P cores and instead use thatrecovered transistor budget for more ST performance.

If there was something they could do better with that budget that gave them higher performance than HT, maybe they would have done it already.

scineram · Dec 18, 2022

I think high IPC Arm cores lack SMT for a reason. Like Apple would rather have a single thread be able to utilise the core resources as much as possible than trying to fill it up with another thread.

DrMrLordX · Dec 18, 2022

Thunder 57 said:
So many times I see "loosing".

Next time they typo, try loosing the hounds! That'll show em.

Thunder 57 · Dec 18, 2022

DrMrLordX said:
Next time they typo, try loosing the hounds! That'll show em.

If you are going by the Simpsons, I believe that would be "Release the hounds!".

LightningZ71 · Dec 18, 2022

TheELF said:
Yes and also remove body fat from the belly and use that to make somebody taller....

Adding circuitry and transistors to a core does not make it faster, you have to come up with ways to do instructions in fewer clock cycles which is something the industry has been refining for 40+ years now and there isn't much left on that front, or you have to increase the amount of clock cycles per time to make things faster.

Having a higher CB23 on single core is not being faster, it is having more throughput which is the same as having more throughput on additional cores.
If you need single threaded speed you do need it because of things that only have one single thread to run so you can't leverage extra cores or extra IPC on the same core, it only uses as much as it uses.

Someone REALLY needs to inform Apple about the "fact" that you can't increase ST throughput by throwing more transistors at the problem! As we all are well aware, increasing the L1 caches has zero effect on ST, as does expanding the OoO window with larger buffers. Making more microcoded instructions into hardware circuits does nothing for throughput either. Going wider hasn't helped ever either.

The fact is that defending against memory and processing integrity vulnerabilities is requiring more and more transistors and costing base level performance in the process. A large number of those vulnerabilities are associated with SMT implementations. Eliminating SMT also removes a lot of those issues and eliminates the extra transistors spent mitigating and protecting against them. There are many security focused VM hosts put there that disable SMT as a matter of course already, so any transistors spent on SMT are useless to them already.

Carfax83 · Dec 18, 2022

I don't see the point in getting rid of SMT if the performance trend for software is towards increased parallelism. I can't really think of any application that depends purely on single threaded performance anymore. Even the most basic applications like browsers which everyone uses has been multithreaded for years now and have become very sophisticated in how they leverage modern CPUs and GPUs.

scineram · Dec 19, 2022

Well then why don't you browse twitter on a 16 core Opteron over a 7600X, without SMT?

LightningZ71 · Dec 19, 2022

Carfax83 said:
I don't see the point in getting rid of SMT if the performance trend for software is towards increased parallelism. I can't really think of any application that depends purely on single threaded performance anymore. Even the most basic applications like browsers which everyone uses has been multithreaded for years now and have become very sophisticated in how they leverage modern CPUs and GPUs.

I made this point before: though many applications are multithreaded, they still commonly have precious few, often just one or two, that are "performance critical" in that the performance of those specific threads is the main limiting factor with respect to user experience or application productivity. The other threads tend to be housekeeping, background processes and work-ahead prestaging of information which, while important, is not compute heavy and isn't performance determinant.

If we can make those performance critical threads more performant while still maintaining sufficient MT throughput via E cores, we improve more cases for everyone.

Kocicak · Dec 19, 2022

Carfax83 said:
I don't see the point in getting rid of SMT if the performance trend for software is towards increased parallelism.

JEEZ, PC CPUs have now up to 16 or 24 physical cores available, that is not enough parallelism for you? Nobody wants to make 1 core CPUs anymore.

TheELF · Dec 19, 2022

LightningZ71 said:
Someone REALLY needs to inform Apple about the "fact" that you can't increase ST throughput by throwing more transistors at the problem! As we all are well aware, increasing the L1 caches has zero effect on ST, as does expanding the OoO window with larger buffers. Making more microcoded instructions into hardware circuits does nothing for throughput either. Going wider hasn't helped ever either.

Stop bringing up apple until they can run windows and have 100% software compatibility...
Anybody can make an extremely optimized CPU core if they don't have to care about running (all) software on it.
L1 OoO and wider cores have nothing to do with removing HT/SMT they are things that happen in parallel.
They don't have to remove HT to make space for these things. The F versions of CPUs lack the whole iGPU so they would have all the space and all the saved transistor budget you would ever be able to use up.

LightningZ71 said:
The fact is that defending against memory and processing integrity vulnerabilities is requiring more and more transistors and costing base level performance in the process. A large number of those vulnerabilities are associated with SMT implementations. Eliminating SMT also removes a lot of those issues and eliminates the extra transistors spent mitigating and protecting against them. There are many security focused VM hosts put there that disable SMT as a matter of course already, so any transistors spent on SMT are useless to them already.

Exactly, you can just disable it if it causes any slow downs for you, and everybody else can still enjoy up to 100% increase in throughput.

Question Intel 12th to 13th generation performance comparison

Golden Member

Golden Member

Golden Member

Platinum Member

Lifer

No Lifer

Diamond Member

Golden Member

Lifer

No Lifer

Golden Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Diamond Member

Lifer

Senior member

Lifer

Diamond Member

Platinum Member

Diamond Member

Senior member

Platinum Member

Golden Member

Diamond Member