The real state of HT technology on Haswell core

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Lots of people on this forum go on saying that haswell bucks the trend of diminishing returns of HT's activation that HT has had every since SB because Intel's been busy with improving all the important areas responsible for IPC like branch prediction with every iteration so each new or refreshed core could better utilize its execution units with just one thread and as that trend continued it left HT with fewer idling execution units for the logical cores which inevitably lead to less performance gains, that was certainly true from Nehalem to IVYBRIDGE, I have seen tests that confirm this but as I said before people go on saying that HW has a better implemented HT technology that provides bigger benefits to performance than any of its predecessor. But I have never seen tests that would confirm this.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Holy run-on sentence Batman!

What?
UPDATE: I've just checked what it is.
A RUN-ON SENTENCE (sometimes called a "fused sentence") has at least two parts, either one of which can stand by itself (in other words, two independent clauses), but the two parts have been smooshed together instead of being properly connected. Review, also, the section which describes Things That Can Happen Between Two Independent Clauses.

It is important to realize that the length of a sentence really has nothing to do with whether a sentence is a run-on or not; being a run-on is a structural flaw that can plague even a very short sentence:

Sorry, I didn't realize it was incorrect. I always try my best to write English properly but despite my best efforts I often make mistakes. I apologize and next time I'll try to connect the clauses in my sentences correctly or I'll just split those into multiple sentences.
 
Last edited:

NTMBK

Lifer
Nov 14, 2011
10,461
5,845
136
What?
UPDATE: I've just checked what it is.
A RUN-ON SENTENCE (sometimes called a "fused sentence") has at least two parts, either one of which can stand by itself (in other words, two independent clauses), but the two parts have been smooshed together instead of being properly connected. Review, also, the section which describes Things That Can Happen Between Two Independent Clauses.

It is important to realize that the length of a sentence really has nothing to do with whether a sentence is a run-on or not; being a run-on is a structural flaw that can plague even a very short sentence:

Sorry, I didn't realize it was incorrect. I always try my best to write English properly but despite my best efforts I often make mistakes. I apologize and next time I'll try to connect the clauses in my sentences correctly or I'll just split those into multiple sentences.

No worries, your English is still a lot better than my Polish ;) I've just about managed to remember "cześć", "takk" and "nie"! But breaking up your post with punctuation and spacing makes it easier to read, and more likely for people to respond. :thumbsup:
 

coercitiv

Diamond Member
Jan 24, 2014
7,393
17,538
136
as I said before people go on saying that HW has a better implemented HT technology that provides bigger benefits to performance than any of its predecessor. But I have never seen tests that would confirm this.

Take this Anandtech review of the Anniversary Edition Pentium - G3258.

In multithreaded scenarios the i3 @ 3.5Ghz is equal or better than the Pentium @ 4.7Ghz. That means HT compensates for at least a 35% jump in frequency.

Would you say that is better or worse than previous jumps in HT performance?
 

Qwertilot

Golden Member
Nov 28, 2013
1,604
257
126
You'd think the bench scores on the main site would do this. Just check i5 vs i7 over a couple of generations and see if the difference has grown or shrunk?

Didn't look obvious to me at a glance :)
(and maybe the odd other thing too.).
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
Take this Anandtech review of the Anniversary Edition Pentium - G3258.

In multithreaded scenarios the i3 @ 3.5Ghz is equal or better than the Pentium @ 4.7Ghz. That means HT compensates for at least a 35% jump in frequency.

Would you say that is better or worse than previous jumps in HT performance?

It would be a bit like comparing apples to oranges, because of the differing software used in that review as compared to the reviews of older generation HT enabled Intel CPUs. I remember an early HT enabled Xeon review(probably Nehalem but I'm not sure, could have been an SB Xeon) on Anandtech where the writer said that across their benchmark suite a HT enabled core performed liked 1.4 of an actual physical core so the performance figures are very similar across a very different software. It seems that the Xeon benefited more but server software is different. A 35% jump in frequency doesn't mean a 35% jump in performance, more like 30%.
 

Squeetard

Senior member
Nov 13, 2004
815
7
76
My g19's lcd screen has a bar graph of all 8 cores (4 physical and 4 hyperthread) that shows usage in real time.
I play Archeage (Cryengine 3) and the 4 physical cores get used a lot with the odd small bump in a few virtual cores. A year or 2 ago and one or 2 cores would be pinned and the rest idle.
Playing Dragon Age Inquisition (Frostbite Engine)and all 8 cores are getting pinned, I've never seen a game use this much cpu, like I am running the prime 95 torture test. Temps show it too.
Games are getting better at using all available cpu cores, be they physical or virtual, I'm guessing developing for the new consoles has impacted this.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
You'd think the bench scores on the main site would do this. Just check i5 vs i7 over a couple of generations and see if the difference has grown or shrunk?

Didn't look obvious to me at a glance :)
(and maybe the odd other thing too.).

Are they tested with the exact same software if not newer CPUs could be tested with better threaded software possibly making HT gains look better compared to older CPUs. They have to be tested with the same software and with the same amount of cache so no an I5 vs an i7 only an i7 vs i7 with HT disabled. It's just my speculation that newer software would be better threaded but there's more to that then how well threaded an app is. The other important factor is optimization, if newer software can sustain better IPC then there will be fewer execution resources left for logical core, so I would be hesitant to ascertain that all newer and better threaded software benefit more from HT then older software. As I said another flaw of such comparisons is that they don't isolate HT as the only variable except for i5-750 and i5-760, all the other desktop i5s have a 1/3 third of its cache disabled.
ps. Am I correct in assuming that games are usually a quite branchy and a low IPC software? Maybe not extremely so but the average IPC in most games is on the lower end of the software spectrum, isn't it? They also aren't very easy targets for parallelization but a game can be written to use heavily more than 4 threads even now there are examples of such games.
 
Last edited:

cytg111

Lifer
Mar 17, 2008
26,248
15,662
136
Take this Anandtech review of the Anniversary Edition Pentium - G3258.

In multithreaded scenarios the i3 @ 3.5Ghz is equal or better than the Pentium @ 4.7Ghz. That means HT compensates for at least a 35% jump in frequency.

Would you say that is better or worse than previous jumps in HT performance?

I think the argument is still valid, IIRC (i know this has been debated before, but not conclusive), HT works when you have a stall, a cache miss, branch prediction gone wrong, instead of idling while Thread-0 waits to get data from memory, it is swapped out and the 'hyper thread' Thread-1 is swapped in to get a little something before daddy(Thread-0) returns home again.
So by eliminating bottlenecks, larger caches (less time daddy aint home), better branch prediction (daddy never leaves the house) HT performance should be dropping fra arch to arch.
If this is not happening, then why not? :) How does a modern HT architecture work, I cant imagine a healthy runing thread getting swapped out to make room.. that would be counter productive.
 

escrow4

Diamond Member
Feb 4, 2013
3,339
122
106
If an engine can use it properly there won't be properly. My 5930K has 6 cores evenly loaded up @ 3.7GHz in game in Inquistion (never checked the hyperthreaded "cores") - that is well utilised. It all comes down to the app.
 

coercitiv

Diamond Member
Jan 24, 2014
7,393
17,538
136
The i3 also has AVX and AVX2 enabled, and 1MB extra L3 cache, so not quite a fair comparison.
Actually the Pentium has more cache per thread :) The AVX contribution, if significant, does break the comparison unfortunately.

A 35% jump in frequency doesn't mean a 35% jump in performance, more like 30%.
And where did I say otherwise?

Moreover, just to play devil's advocate a bit more, even if HT brings about the same relative benefit as it did in other generations, it's still a better implemetation since it has to compensate for the overall increase in efficiency. (less waste, less time to bring a benefit)
 
Last edited:

Flapdrol1337

Golden Member
May 21, 2014
1,677
93
91
If an engine can use it properly there won't be properly. My 5930K has 6 cores evenly loaded up @ 3.7GHz in game in Inquistion (never checked the hyperthreaded "cores") - that is well utilised. It all comes down to the app.

"cores evenly loaded" tells you exactly nothing.

If I run a single thread of prime95 it'll give me evenly loaded cores.
 

Haserath

Senior member
Sep 12, 2010
793
1
81
Haswell should theoretically improve SMT with its extra execution ports, but I think the threads still share the front end consecutively, which had some prediction improvement again.

Nosta has said Skylake would feature two front ends per execution back end to enhance HT(yes, his predictions are wild but this makes sense). Haven't seen anything on skylake yet either and it's only 2-3 quarters away from release.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
"cores evenly loaded" tells you exactly nothing.

If I run a single thread of prime95 it'll give me evenly loaded cores.

Exactly, and just to go further, that's just the windows scheduler switching between the two cores very fast, making a graphical representation in task manager look like each core is being utilized 50%