Question Intel 12th to 13th generation performance comparison

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

GunsMadeAmericaFree

Golden Member
Jan 23, 2007
1,245
290
136
Intel13thGenRefresh.jpg


I thought this was an interesting read - benchmark comparisons between Intel 12th generation & 13th generation:

Article with details

That's an average performance increase of 47% from one generation to the next. I wonder if AMD will have a similar increase?
 
Last edited:

Kocicak

Senior member
Jan 17, 2019
982
973
136
I am not sure if you know how the load is distributed to the cores, but second threads of P cores get work as last ones. I actually recently tested running 13900K with and without HT and with increasing load, the results were equal to cca 35 K Cinebench points. There the HT OFF chip stopped, but the HT ON chip could add further 5K points.

Turning the HT off affects performance ONLY if you feed it 25 and more threads. 24 thread and lower load sees no negative effect of turning the HT off.

That is 12,5% performance penalty, which is realised ONLY if you run extremely high 32 thread load. If removing of HT could bring even 2% improvement in 1-8 thread load, I am all for it. I would personally never use that extremely high load anyway, so I would not see any negative performance impact.
 
Last edited:
  • Like
Reactions: scineram

Kocicak

Senior member
Jan 17, 2019
982
973
136
Based on my previous measured results, I made a comparison of 13900K with HT on and off and hypothetical 13900K, which had its P core HT circuitry removed, which allowed 3% performance improvement, dependent on load intensity.

Only somebody running full multithreaded load all the time would prefer to have HT on the chip (and those people should probably get some proper workstation with server CPU in it).

Everybody else would welcome 3% P core improvement enabled by HT removal.

hyp HT remov effect.png
 
Last edited:
  • Like
Reactions: Exist50

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
Some people are failing to grasp other elements of the situation. If Intel had decided to sacrifice HT on the P cores in a quest for higher ST performance, the entire stack would look different! Intel already used four different masks for Alder Lake (2+8, 6+8, 8+8, and 6+0). Raptor Lake adds an additional mask. Without HT, the dies would be split differently, there would be e cores at every I level, and I3 would be 2+4 and 2+8, i5 would be 4+8, with K being 4+16, i7 would be 6+8 with K being 6+16, and i9 would be 6 +16 with K being 6+24. The Intel Processor (ex pentium and celeron) could be die recovery.

With an additional 3-5% more ST performance and more hardware cores to throw at MT tasks, every benchmark, save for ones that specifically hammer exclusively 7 and 8 thread cases would see a notable uplift. Also, with only 2 tiers of thread types to manage, even the thread director would work better.
 

VirtualLarry

No Lifer
Aug 25, 2001
56,353
10,050
126
There's no performance reason to get rid of HT. Not in this stage of it's evolution. Don't forget, for very fast cores, it helps to mask memory access latency, when they
are forced to go out to DRAM.
 
  • Like
Reactions: lightmanek

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
That is 12,5% performance penalty,
If you use all of your full cores and all of your small cores together...which is somehow representative to single core speeds for you?!

On modern CPUs you get ONLY ONE or maybe two cores that run at the highest speed, Intel thermal velocity boost, and it only runs on the best two cores.
So removing HT would cut possible throughput by 100% or 50% respectively if you get the highest boost on one or two cores.
Single-Core TVBTakes the faster of the two favored CPU cores to a speed superior to Turbo Boost Max 3.0.

s6-a04-05-abt-boost-tech-bar-chart-original-rwd.jpg.rendition.intel.web.1920.1080.jpg
 
  • Like
Reactions: Storm-Chaser

Kocicak

Senior member
Jan 17, 2019
982
973
136
The example I posted above does not use any super high single core frequencies, all the P and E cores ran at the indicated frequencies all the time.

The hypothetical improvements enabled by removing HT circuitry would benefit performance of the P cores at any frequency.

BTW if you add percentage improvements and regressions in the above scenario, you get a net gain in performance.

If you applied some weights preferring lower intensity load, the net gain would be even larger.
 
Jul 27, 2020
16,340
10,352
106
The hypothetical improvements enabled by removing HT circuitry would benefit performance of the P cores at any frequency.
Sorry but don't think that's happening ever (unless Intel creates a die with a sea of 100 cores or something). Let's suppose Intel has 32 P-cores in their flagship consumer part in the future. Marketing will complain if they remove HT and the illusion of double the amount of cores. 64 looks a lot more powerful than 32 on paper. It's like silicone implants. First impressions matter.
 
  • Like
Reactions: Thunder 57

VirtualLarry

No Lifer
Aug 25, 2001
56,353
10,050
126
The hypothetical improvements enabled by removing HT circuitry would benefit performance of the P cores at any frequency.
On the whole, this is NOT true.

It may have a slight benefit to ST apps, but would grossly waste throughput in MT scenarios.

Since software is increasingly becoming MT, both within a single app, as well as collectively as an OS, as other bottlenecks (storage with PCI-E 5.0, networking with 2.5/10GbE) get obliterated.

The writing is on the wall. Removing HT at this point would be a net loss to performance.
 

Kocicak

Senior member
Jan 17, 2019
982
973
136
It may be useful to remind that the high single core performance of Alder lake CPUs put Intel back on track in PCs, nothing else.

Achieving highest possible single core performance is a first priority for Intel. They do it even at the cost of possibly running the CPUs at frequencies outside of what they are really capable of.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
It may be useful to remind that the high single core performance of Alder lake CPUs put Intel back on track in PCs, nothing else.

Achieving highest possible single core performance is a first priority for Intel. They do it even at the cost of possibly running the CPUs at frequencies outside of what they are really capable of.
No, the ST clock intel states is what each and every single CPU can get when running whatever you want even the heaviest threads, and it will keep running at that speed as long as you can keep up the power and cooling it needs.
You can (Single core) overclock above that so it's very inside of what they are capable of.
It's ryzen where they state a max clock that you only see in very small bursts and only if you run very light threads.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
How is my point not getting through?

I am not proposing that they just eliminate HT in isolation! I am proposing that a processor could see a notable performance improvement by doing three things:
1) eliminate HT and use that circuit and transistor cost for additional ST throughput.
2) reduce the number of large cores at the top end of the stack
3) use the die real-estate from the reduction of the large cores to add more small cores.

As an example: instead of Raptor Lake being 8 P cores plus 16 E cores, make it 6 P cores with 24 e cores and ALSO remove HT from the P cores and instead use thatrecovered transistor budget for more ST performance. Now, it's 6p+ and 24e cores. You do loose two threads, going from 32 down to 30, but, of those 30 threads, all are hardware instead of only having 24 hardware threads and 8 hyperthreads. That is MORE total throughput as the e cores are much faster than P core second threads in addition to faster ST performance from the P cores due to more transistors being spent on it in addition to their NOT being L2 contention on the P cores between threads!

It is Win-win everywhere but marketing!
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
1) eliminate HT and use that circuit and transistor cost for additional ST throughput.
Yes and also remove body fat from the belly and use that to make somebody taller....

Adding circuitry and transistors to a core does not make it faster, you have to come up with ways to do instructions in fewer clock cycles which is something the industry has been refining for 40+ years now and there isn't much left on that front, or you have to increase the amount of clock cycles per time to make things faster.

Having a higher CB23 on single core is not being faster, it is having more throughput which is the same as having more throughput on additional cores.
If you need single threaded speed you do need it because of things that only have one single thread to run so you can't leverage extra cores or extra IPC on the same core, it only uses as much as it uses.
 

Thunder 57

Platinum Member
Aug 19, 2007
2,675
3,801
136
Sorry but don't think that's happening ever (unless Intel creates a die with a sea of 100 cores or something). Let's suppose Intel has 32 P-cores in their flagship consumer part in the future. Marketing will complain if they remove HT and the illusion of double the amount of cores. 64 looks a lot more powerful than 32 on paper. It's like silicone implants. First impressions matter.

Liked for correct usage of the word silicone. Now if only people could correctly use the word "lose" and "losing". So many times I see "loosing".
 

scineram

Senior member
Nov 1, 2020
361
283
106
I think high IPC Arm cores lack SMT for a reason. Like Apple would rather have a single thread be able to utilise the core resources as much as possible than trying to fill it up with another thread.
 

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
Yes and also remove body fat from the belly and use that to make somebody taller....

Adding circuitry and transistors to a core does not make it faster, you have to come up with ways to do instructions in fewer clock cycles which is something the industry has been refining for 40+ years now and there isn't much left on that front, or you have to increase the amount of clock cycles per time to make things faster.

Having a higher CB23 on single core is not being faster, it is having more throughput which is the same as having more throughput on additional cores.
If you need single threaded speed you do need it because of things that only have one single thread to run so you can't leverage extra cores or extra IPC on the same core, it only uses as much as it uses.
Someone REALLY needs to inform Apple about the "fact" that you can't increase ST throughput by throwing more transistors at the problem! As we all are well aware, increasing the L1 caches has zero effect on ST, as does expanding the OoO window with larger buffers. Making more microcoded instructions into hardware circuits does nothing for throughput either. Going wider hasn't helped ever either.

The fact is that defending against memory and processing integrity vulnerabilities is requiring more and more transistors and costing base level performance in the process. A large number of those vulnerabilities are associated with SMT implementations. Eliminating SMT also removes a lot of those issues and eliminates the extra transistors spent mitigating and protecting against them. There are many security focused VM hosts put there that disable SMT as a matter of course already, so any transistors spent on SMT are useless to them already.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
I don't see the point in getting rid of SMT if the performance trend for software is towards increased parallelism. I can't really think of any application that depends purely on single threaded performance anymore. Even the most basic applications like browsers which everyone uses has been multithreaded for years now and have become very sophisticated in how they leverage modern CPUs and GPUs.
 
  • Like
Reactions: VirtualLarry

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
I don't see the point in getting rid of SMT if the performance trend for software is towards increased parallelism. I can't really think of any application that depends purely on single threaded performance anymore. Even the most basic applications like browsers which everyone uses has been multithreaded for years now and have become very sophisticated in how they leverage modern CPUs and GPUs.
I made this point before: though many applications are multithreaded, they still commonly have precious few, often just one or two, that are "performance critical" in that the performance of those specific threads is the main limiting factor with respect to user experience or application productivity. The other threads tend to be housekeeping, background processes and work-ahead prestaging of information which, while important, is not compute heavy and isn't performance determinant.

If we can make those performance critical threads more performant while still maintaining sufficient MT throughput via E cores, we improve more cases for everyone.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
Someone REALLY needs to inform Apple about the "fact" that you can't increase ST throughput by throwing more transistors at the problem! As we all are well aware, increasing the L1 caches has zero effect on ST, as does expanding the OoO window with larger buffers. Making more microcoded instructions into hardware circuits does nothing for throughput either. Going wider hasn't helped ever either.
Stop bringing up apple until they can run windows and have 100% software compatibility...
Anybody can make an extremely optimized CPU core if they don't have to care about running (all) software on it.
L1 OoO and wider cores have nothing to do with removing HT/SMT they are things that happen in parallel.
They don't have to remove HT to make space for these things. The F versions of CPUs lack the whole iGPU so they would have all the space and all the saved transistor budget you would ever be able to use up.
The fact is that defending against memory and processing integrity vulnerabilities is requiring more and more transistors and costing base level performance in the process. A large number of those vulnerabilities are associated with SMT implementations. Eliminating SMT also removes a lot of those issues and eliminates the extra transistors spent mitigating and protecting against them. There are many security focused VM hosts put there that disable SMT as a matter of course already, so any transistors spent on SMT are useless to them already.
Exactly, you can just disable it if it causes any slow downs for you, and everybody else can still enjoy up to 100% increase in throughput.