Almost 20% (18%?) increase in IPC from SB to Haswell, or am I diluted/personally bias?
Which is not that much.
You are not biased, but that isn't really a substantial improvement.
Its alright..
Almost 20% (18%?) increase in IPC from SB to Haswell, or am I diluted/personally bias?
No need to repeat myself, I'll just point you to the same conversation I've been having with people since forever.
http://forums.anandtech.com/showthread.php?t=2329338&highlight=hyper
its good showing for the 8350 but even then its basically right there with the slower clocked 4 core 4670k. oc them both and the 4670k leaves it behind while using quite a bit less power. heck even the 2500k would also pass up the 8350 with both oced. and again this is about as good as the 8350 can look.
That's... part of my explanation. There are cases where you can neatly fit your load into the execution units so there's nothing left for the second thread.
But unless you're doing something as "easy" as Linpack, that's not going to happen for most code. It's simply not possible to optimize to that point.
x264 has a lot of hand-crafted assembly for example. It's so fast that on its fastest setting, it can match the pace of Quicksync (dedicated encoder) but yield similar or better quality. And even then, it gains benefit from Hyperthreading.
The games is using all eight cores of the CPU. (although two cores are receiving a moderately greater load than the other six.)
For them to be "Intel Optimized" or "Intel Biased", then you wouldn't be able to extract a large performance increase by enabling hyperthreading
But that's the thing, there are some (real) workloads that are impossible to optimize to the point that Hyperthreading doesn't help.
Something simple like Linpack can do it, but as general purpose processors, CPU's face loads that are complex enough that it can't.
What if you have code that's not math heavy but lots of branching and data loading? OOE is limited because of the dependencies so while your pipeline is stalled, a second thread has resources to play with. Net benefit. You can't optimize that kind of thing away.
The way SMT works, if one thread is stalled doing its thing, the free math assets could be used by the other thread.
Linpack can easy fill a CPU's math assets since it's just doing math questions really quickly. You don't have to worry about dependencies or anything, just pack the math together and let the CPU chew through it. It should easy to see why a single thread could then use up all the math assets in a superscalar CPU.
But what if you're running mixed code, like a game where you have AI decision (branching) as well as physics code (math) for instance? Now you can more efficiently use the CPU's resources with two threads. Use both the branching crap and math much at the same time.
The idea here isn't that the first thread is any faster (if anything it's slower sometimes) but because you can do additional work on the second thread, the net benefit is positive.
And if you don't have the second thread, then the first thread gets the all the CPU and runs at the same speed as if the CPU isn't SMT enabled.
For example x264 has a lot of hand-crafted assembly. It's so fast that on its fastest setting, it can match the pace of Quicksync (dedicated encoder) but yield similar or better quality. And even then, it gains benefit from Hyperthreading.
This is nonsense unless you compare a high clocked 8 core Ivy Bridge-EP with a Mainstream Haswell using Quicksync. Quicksync VBR+mbbrc on Haswell i5 is twice as fast as ultrafast preset x264 with better quality.
Sounds like a problem with how you are dispatching (your program logic) more than being that the core just can't execute out of order itself.
What you're talking about sounds like the tax for continuing backwards compatibility with x86 instructions.
It would probably be easier if the programmers coded to the actual RISC co-processors in the cores.
Eh, last time I read reviews of Quicksync (Ivy Bridge) this was the case. I just got a new i7-4700k CPU so I'll do some tests this weekend. I don't know if they've improved Quicksync performance though (although x264 got some gains with AVX2, maybe they'll balance out).
They don't because they can't..
No, I was speaking about CPU's in general without Intel particularly in mind. You'll note that I had mentioned POWER as another CPU that heavily uses SMT.
Dependencies and stalls are a fact of life when executing out of order (since that still operates on a single thread).
Use Handbrake for Quicksync, I recommend TU4 and VBR as a starting point and make sure gop-ref-dist is on default (3). If you want something really fast try TU7 :biggrin:
Gains from AVX2 were relatively small in the 1-5% range.
No CRF modes available? I can't see the "VBR" being anything other than ABR if it's still a single pass. In any case, it's easy enough to test speed.
Ugh, Intel didn't make it easy to use Quicksync if you have another video card as your primary. I was hoping to niggle it into action tonight but I'll have to work it out later.
Hmm, I'm on Windows 8.1 and have the 15335 drivers installed. Still giving me a Code 43 error when I try to enable it.
A quick google seems to indicate some people saying I may have had to install Windows with the iGFX as my primary in the first place. Hopefully that guy is just wrong.
You're in single player though
Compare the 780 cpu results to the R290X results in multiplayer...
![]()
Every cpu takes a big hit, the 8350 at 5GHz only manages 51 fps with a R290X.
