Question Intel 12th to 13th generation performance comparison

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

GunsMadeAmericaFree

Golden Member
Jan 23, 2007
1,374
375
136
Intel13thGenRefresh.jpg


I thought this was an interesting read - benchmark comparisons between Intel 12th generation & 13th generation:

Article with details

That's an average performance increase of 47% from one generation to the next. I wonder if AMD will have a similar increase?
 
Last edited:

JustViewing

Senior member
Aug 17, 2022
267
470
106
Gracemont IPC is ~equal to Skylake IPC. So 1T Gracemont vs 1T Skylake at the same clocks will perform similarly.
That is when code doesn't use AVX256 and memory pressure is low.

Skylake with HT will perform better than single E-Core in normal workloads and games.

I'm not sure why you conclude that there's no reason to go wide without SMT. The widest and highest IPC mainstream CPU arch you can buy right now is Apple's big cores, which are significantly wider and higher IPC than Intel or AMD's, yet lack SMT.

If Apple add SMT to their CPU, they can increase their MT performance with little or no cost.
 

LightningZ71

Platinum Member
Mar 10, 2017
2,139
2,590
136
But they do increase IPC by 3-5% every gen, and they do that even while spending all of these resources on HT...
Yes, of course, they improve the basic logical design of and add additional resources to the core, then they perform the needed additions to make sure that HT can still take advantage of those new and expanded features. They are getting 3-5% while also devoting additional resources to HT. What we are proposing is to, instead of allocating additional resources to HT, instead, spend them on making the ST performance of the core better. That way, the generational improvement would be MORE THAN the 3-5% per generation.
 

Starjack

Member
Apr 10, 2016
25
0
66
That is when code doesn't use AVX256 and memory pressure is low.

Skylake with HT will perform better than single E-Core in normal workloads and games.

Why i ask because notebookcheck site talks about the processor in my current laptop, the Core i3-1215U with 2 P-cores and 4 E-cores, the latter performing similar to old Skylake cores or compare to the Core i7-6700HQ. In this case, they never mentioned if is with HT enable or disable (single-threaded).
 

JustViewing

Senior member
Aug 17, 2022
267
470
106
Why i ask because notebookcheck site talks about the processor in my current laptop, the Core i3-1215U with 2 P-cores and 4 E-cores, the latter performing similar to old Skylake cores or compare to the Core i7-6700HQ. In this case, they never mentioned if is with HT enable or disable (single-threaded).
E-Cores doesn't have native AVX256 units, it is like Zen1. Also we don't know how a Skylake ported to Intel 10nm+++/7 will perform or its size.
 

Kocicak

Golden Member
Jan 17, 2019
1,177
1,232
136
If Apple add SMT to their CPU, they can increase their MT performance with little or no cost.

Then why haven't they?
Well, it was not worth it for whatever reason: Too expensive, it would take too long, it would decrease performance in some aspect they did not want, it would interfere with some other function, it would increase complexity too much, it would bring some vulnerability, it would be awful and inconvenient, etc.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
E-Cores doesn't have native AVX256 units, it is like Zen1. Also we don't know how a Skylake ported to Intel 10nm+++/7 will perform or its size.
It's like how Zen 4 handles AVX-512, but you don't see people complaining about that, do you? It clearly works well enough in practice. The benchmarks speak for themselves.

And porting Skylake to another process wouldn't change its IPC, nor would it scale down to close to Gracemont size given known 10nm/Intel 7 scalars.
 

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
Needless to say how shocking this is to me, to realize, that I got something as fundamental as SMT wrong for more than 2 decades - after having spent countless hours of learning about technical aspects of CPUs. Somehow the assumption was burned into my brain, that the second thread only gets the resources that the prime one does not use - and that is also how Intel was explaining it in Layman's terms, when they introduced it. Also, most articles only mention the total benefit of SMT but do not go into detail if the throughput of these two threads is symmetrical or not.
But the positive take-away for me is that I learned (and proved) something fundamental today. Thanks for bringing me on that path ;)
This happens because you start both instances with the same priority so task manager assigns the same amount of resources to each.
A thread only takes all the resources it can if it runs at a higher priority than anything else on that core, or on the whole CPU if enough threads are running.
Try your experiment again but change the priority of one to real-time and the other to idle.
 
Last edited:

TheELF

Diamond Member
Dec 22, 2012
4,027
753
126
What we are proposing is to, instead of allocating additional resources to HT, instead, spend them on making the ST performance of the core better.
But you are assuming that this is possible without any basis in reality without having any facts. You just believe it so it must be true. You can't possibly know if the IPC improvement they have each gen isn't the max that is possible, no matter what they do.
 

Starjack

Member
Apr 10, 2016
25
0
66
E-Cores doesn't have native AVX256 units, it is like Zen1. Also we don't know how a Skylake ported to Intel 10nm+++/7 will perform or its size.

Well even with slight architecture changes, somehow the Gracemont cores achieve IPC of Skylake cores. Or unless someone on this forum did a valid comparison with these competing architects which i was trying to find out.

"It's like how Zen 4 handles AVX-512, but you don't see people complaining about that, do you? It clearly works well enough in practice. The benchmarks speak for themselves.

And porting Skylake to another process wouldn't change its IPC, nor would it scale down to close to Gracemont size given known 10nm/Intel 7 scalars."


Well i want to believe this as well.
 

Kocicak

Golden Member
Jan 17, 2019
1,177
1,232
136
But you are assuming that this is possible without any basis in reality without having any facts. You just believe it so it must be true. You can't possibly know if the IPC improvement they have each gen isn't the max that is possible, no matter what they do.
Even if they did nothing else than just use the freed space to make caches larger, this elementary move would help performance. And there are many improvements they can make by just adding more already existing functional units. And by making things simpler they can easier implement new ways of how things work.

This is not hard to imagine.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
22,536
12,403
136
I believe that it was recently necessary to disable SMT in some Intel CPUs due to some security problem? I do not follow the news that closely, I am not sure about this.

Basically yes and no. Yes disabling HT could save you from some of the many exploits, but no it didn't save you from all of them, and no it didn't have to be that way. For example, Raptor Lake should have had most of those vulnerabilities patched or mitigated to the point that disabling it would provide you with no safety. But we were discussing ST performance (not Spectre/Meltdown etc.) so that's a bit of a derail.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
What we are proposing is to, instead of allocating additional resources to HT, instead, spend them on making the ST performance of the core better. That way, the generational improvement would be MORE THAN the 3-5% per generation.

why-not-both-linus-both.gif


Golden Cove had about a 20% IPC gain over Rocket Lake, and if I recall the gain over Skylake was over 40%, all without sacrificing SMT.
 
  • Like
Reactions: JustViewing
Jul 27, 2020
24,172
16,859
146
This is not hard to imagine.
Would be helpful if you tell us HOW they could make things simpler. From an engineering point of view. With details about the actual execution units and what needs to be changed/enhanced in those units. What you are doing is wishful thinking. Your thoughts are perpendicular to the reality plane.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136

DrMrLordX

Lifer
Apr 27, 2000
22,536
12,403
136
Golden Cove had about a 20% IPC gain over Rocket Lake, and if I recall the gain over Skylake was over 40%, all without sacrificing SMT.

Exactly. Thank you.

The amount of die area and complexity imposed by HT is apparently not very large. Intel has been implementing HT successfully for years (stupid silicon-level vulnerabilities aside). It's highly doubtful they could have made a better, faster Golden Cove by just getting rid of SMT/HT. AMD gained a lot by abandoning CMT which was holding back their ST performance.

Golden Cove (and by extension, Raptor Cove) is not an area-efficient design, yes, this is true. But we can't really place the blame at the feet of HT.

Nah, Skylake's IPC gain was actually very modest. Anandtech only found +5.7% vs Haswell, and that's ignoring Broadwell.

He meant the gain of Golden Cove over Skylake, not the gain of Skylake over Haswell.
 
  • Like
Reactions: Carfax83

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
Would be helpful if you tell us HOW they could make things simpler. From an engineering point of view. With details about the actual execution units and what needs to be changed/enhanced in those units. What you are doing is wishful thinking. Your thoughts are perpendicular to the reality plane.
AMD once published a good diagram about how SMT resource allocation works.

1671668251157.png

Anything that's statically partitioned means that each thread only gets half the overall resources. Shared with tagging implies extra logic/memory for those tags. Any prioritization implies extra logic to handle that prioritization. "Competitively shared" means that there's a zero-sum game for resource allocation between the two. And that's just physical hardware. The engineering effort to design such a system is probably the bigger component.
 
Jul 27, 2020
24,172
16,859
146
The engineering effort to design such a system is probably the bigger component.
That's the thing Kocicak is not understanding. With so much work already done and SMT implementations quite matured, who of these two x86 behemoths would throw away their years of hard work out the window, just like that, in a world that's moving to multicore workloads more and more? They would only do that if SMT somehow becomes an impediment to higher speeds. But as Carfax83 pointed out, SMT reduces the impact of branch mispredictions that increase with higher speeds. The only way they would leave SMT out is if they redesigned everything from scratch. So maybe (BIG maybe) Lunar/Nova Lake may not have SMT. But with the way these two companies are obsessed with CBR23 scores, I see SMT4 and SMT8 as more likely.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,982
15,935
136
That's the thing Kocicak is not understanding. With so much work already done and SMT implementations quite matured, who of these two x86 behemoths would throw away their years of hard work out the window, just like that, in a world that's moving to multicore workloads more and more? They would only do that if SMT somehow becomes an impediment to higher speeds. But as Carfax83 pointed out, SMT reduces the impact of branch mispredictions that increase with higher speeds. The only way they would leave SMT out is if they redesigned everything from scratch. So maybe (BIG maybe) Lunar/Nova Lake may not have SMT. But with the way these two companies are obsessed with CBR23 scores, I see SMT4 and SMT8 as more likely.
I think that we should ignore Kocicak. First he says that CPU that take over 100 watts should be outlawed, then buys a 13900k, then a ..... (a bunch of different CPUs and returns them) I can't take this kind of user seriously. We all know SMT works most of the time to our advantage, lets just ignore this guy.
 

Exist50

Platinum Member
Aug 18, 2016
2,452
3,105
136
understanding. With so much work already done and SMT implementations quite matured, who of these two x86 behemoths would throw away their years of hard work out the window, just like that, in a world that's moving to multicore workloads more and more?
That's the sunk cost fallacy. The question is whether the incremental effort needed to maintain/update SMT is worth having the feature at all. The world is moving to multicore workloads, but those are being shifted to different cores altogether. Like, disabling SMT on Haswell made a huge difference. On Raptor Lake? Meh.
They would only do that if SMT somehow becomes an impediment to higher speeds.
That's precisely the argument. Or rather, that implementing SMT is inherently trading off some amount of IPC/single thread performance.
But as Carfax83 pointed out, SMT reduces the impact of branch mispredictions that increase with higher speeds.
That's not really how it works. It can gain more throughput by keeping the core busy while a thread is waiting, but does nothing for single thread performance. The opposite, really.
The only way they would leave SMT out is if they redesigned everything from scratch. So maybe (BIG maybe) Lunar/Nova Lake may not have SMT.
That, I'd agree with. So on the Intel side, either Lion Cove, or if not that, then Royal. AMD, tougher to say. Maybe Zen 6 or 7.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
Nah, Skylake's IPC gain was actually very modest. Anandtech only found +5.7% vs Haswell, and that's ignoring Broadwell. Other sites had somewhat better results with better memory (DDR4), but nothing even close to 40%.


I meant the jump from Skylake to Golden Cove. I think it was over 40% if memory serves. BTW, I just encoded a 4K 60 FPS video with Handbrake x265 with HT on and HT off.

HT off results:

encoded 11849 frames in 658.65s (17.99 fps), 8831.63 kb/s, Avg QP:31.40

HT on results:

encoded 11849 frames in 600.86s (19.72 fps), 8831.63 kb/s, Avg QP:31.40

So HT on completed the task 8.7% faster. Doesn't seem like much until you realize that this workload pegged all 32 threads on my CPU at 100% load, and with HT off, all 24 threads were also pegged to 100%.

This is a highly threaded workload of course, but my point is that having HT on still made the CPU notably faster and more efficient against 24 physical cores.