[THG]Core i7-4770K: Haswell's Performance-Previewed

itsmydamnation · Mar 23, 2013

NTMBK said:
I don't suppose you have links to those Beyond3D articles do you? I'd like to read them to see what they suggest, but their search function is BeyondCrap D:

they are berried deep within the console forum before neogaf noobs decided to run a muck. but it had your normal sort of people answering, reppi, grall etc

Pilum · Mar 23, 2013

itsmydamnation said:
256bit is, thats 8 32bit floats. There are already lots of 128bit INT SIMD ops in SSE. wider vectors is exactly the same as "MOAR CORES!?!?!" you still need data to operate on. Some tasks that will be much easier then others, there are some good topics about 256bit vectors for games on beyond3d. Short story is 128bit SIMD is easy, 256bit is possible but generally requires a rethink in the way you write code and it tends to clash with object orientated coding.

If you could be a little more specific, this might actually be believable. Generally, for vectorizable code, there isn't a "wall" of vector lane count beyond which vectorization becomes hard. If you can do SIMD, you can usually do it in 4, 8, 16 or more lanes.

And SIMD and objected-oriented code will always clash, anyway. OO is a lot of pointer chasing, and SIMD is completely useless there.

However, for physics calculation, higher lane counts may be a problem. You'll usually use 4-way SIMD for doing 3D vector calculations (using only three lanes, ignoring the fourth). So it won't be trivial to use 8-way SIMD on data structure not designed for it. However, this won't be a problem for all engines, as some engines will have been written by non-stupid people who could see that we'd get wider SIMD execution over time. And changing the engine code to accomodate for more flexible SIMD lane width will pay off in the long run, as we'll get wider and wider SIMD engines. Intels Knights family already uses 16 lanes, and AVX is designed to be easily extendable, up to 1024bits/32 lanes IIRC.

However, it'll take a long time until AVX is widely used, simply because you can't depend on it being there. There are still so many C2D, K10, NHM, etc. systems in the market that it doesn't pay off to support AVX. Once you reach more than 50% market saturation that becomes attractive, but we're still a long way off from reaching this point - especially considering the longer PC replacement cycles we have these days.

NTMBK · Mar 23, 2013

Pilum said:
If you could be a little more specific, this might actually be believable. Generally, for vectorizable code, there isn't a "wall" of vector lane count beyond which vectorization becomes hard. If you can do SIMD, you can usually do it in 4, 8, 16 or more lanes.

And SIMD and objected-oriented code will always clash, anyway. OO is a lot of pointer chasing, and SIMD is completely useless there.

However, for physics calculation, higher lane counts may be a problem. You'll usually use 4-way SIMD for doing 3D vector calculations (using only three lanes, ignoring the fourth). So it won't be trivial to use 8-way SIMD on data structure not designed for it. However, this won't be a problem for all engines, as some engines will have been written by non-stupid people who could see that we'd get wider SIMD execution over time. And changing the engine code to accomodate for more flexible SIMD lane width will pay off in the long run, as we'll get wider and wider SIMD engines. Intels Knights family already uses 16 lanes, and AVX is designed to be easily extendable, up to 1024bits/32 lanes IIRC.

However, it'll take a long time until AVX is widely used, simply because you can't depend on it being there. There are still so many C2D, K10, NHM, etc. systems in the market that it doesn't pay off to support AVX. Once you reach more than 50% market saturation that becomes attractive, but we're still a long way off from reaching this point - especially considering the longer PC replacement cycles we have these days.

It's not just a case of "non-stupid people". Supporting the kind of operation bundling you suggest makes the code a lot more complex, and dealing with branching logic is very troublesome. It gets a bit better with AVX because certain operations can be masked, but it's still not ideal.

The 128 bit gather added by AVX2 should also come in handy for existing 128bit wide operations.

inf64 · Mar 23, 2013

On the upgrade to Haswell topic. The way I see it we have 2 groups(generalized) of users that would see huge benefits from upgrading to 4C/8T Haswell and 2 groups of users that would see smaller benefits.

Those who would see massive performance jump (from highest to lowest):
1)Users of pre-Conroe products;C2Q(OCed and stock,45,65nm) users;PhenomI&II X4(OCed and stock).
This group would see some amazing performance gains across the board. ST performance would jump up to 70% (or maybe even more), MT performance in some cases >2x.
2)Nehalem generation users(both stock and OCed);X6 users(both stock and OCed);FX81xx (OCed); SB i5 4C/4T (stock and slightly OCed)
Nehalem group of users would see some big gains in ST performance(~40+% over OCed i7 Vs OCed Haswell i7) and noticeable gains in MT performance(~25-30% is not impossible). Similar goes for X6/FX OCed users the only difference being they would see higher jump in ST performance(compared to Nehalem users) and maybe similar or a bit lower jump in MT performance.
i5 2500K (or any 4C/4T SB gen.) users would see smallish ST gains (~13% if their CPU is OCed) or a bit higher gains(~20% if their SB i5 is stock). In MT workloads they would see very big gains though,~45+% if their CPU is at stock and a bit lower( if they OC). This is all in legacy code.

Those who would see smaller benefits from Haswell upgrade.
1) FX83xx users (mostly OCed). This group of users might see some big gains in ST software and some pretty mediocre (if that) gains in MT software,a mixed bag. On one hand some older legacy apps that are not generally optimized for newer uarchitectures would run much better on Haswell and on the other hand,some newer apps that are optimized for newer uarchitectres would run simiarly on both (MTed+supporting advanced ISA). The only difference is that Haswell will still pull ahead in those AVX2 optimized MTed workloads when they show up in commercial software(by how much is unknown since AVX2 alone is not a guarantee of doubling of throughput due to other possible code limitations). Price difference plays a big role(FX advantage) and power consumption too(i7 advantage).
2) SB i7 (stock and OCed),IB i7 (stock and OCed). Both ST and MT performance increase in legacy software would be hard to warrant a complete platform change. SB owners might see a bit higher performance gains Vs IB users but overall these users would have to plan on using some specialized software that supports AVX2/FMA3 in order to harness the true potential of Haswell. This might take a longer time to happen if we take other ISA extensions and their adoption rate as examples in the history.

Pilum · Mar 23, 2013

NTMBK said:
It's not just a case of "non-stupid people". Supporting the kind of operation bundling you suggest makes the code a lot more complex,

The problems of proper data organization for vector operations (AoS vs SoA) is a very old one. x86 SIMD ISA extensions were introduced in the late 90s, and it was soon clear that the execution width would rise over time (SSE1 in 1999). By the mid-00s it should have been obvious that you need to pay lots of attention to your whole data storage design if you want to keep your code future-safe. If you ignored that whole topic... well, stupid.

Of course if you work on a very old codebase, a proper data storage organization may be hard to implement. Not doing that would still be stupid, even if perfectly explainable due to a variety of reasons (insufficient manpower allocation, management interdiction, etc.).

NTMBK said:
and dealing with branching logic is very troublesome.

You generally don't do branches in vector code, you use predication.

NTMBK said:
It gets a bit better with AVX because certain operations can be masked, but it's still not ideal.

What specifically do you mean? AVX doesn't have a vector select register (in contrast to LNI). Traditional masking with result blending has been possible since MMX, it's just more elegant with AVX.

NTMBK said:
The 128 bit gather added by AVX2 should also come in handy for existing 128bit wide operations.

Gather should always come in handy, also for 256-bit. Actually it should allow high-performance operations even on AoS, as you can just load a 256-bit register from two structures with 128-bit data with one instruction.

NTMBK · Mar 23, 2013

Pilum said:
What specifically do you mean? AVX doesn't have a vector select register (in contrast to LNI). Traditional masking with result blending has been possible since MMX, it's just more elegant with AVX.

Blork, my mistake there. I've been too engrossed in Xeon Phi literature lately, I somehow managed to get that capability of it mixed up with AVX in my head. Doesn't help I was looking at the AVX2 masked gather stuff.

Magic Carpet · Mar 24, 2013

IMO, the best way to boost haswell sales would be to release software that could make use of the new instructions. Anything on the horizon?

Olikan · Mar 25, 2013

good news everyone....haswell is pricier than ivy :hmm:

http://www.fudzilla.com/home/item/30864-haswell-pricing-revealed-by-us-eu-retailers

Idontcare · Mar 25, 2013

Olikan said:
good news everyone....haswell is pricier than ivy :hmm:

http://www.fudzilla.com/home/item/30864-haswell-pricing-revealed-by-us-eu-retailers

To the victor go the spoils.

My predicition - people who wanted the top-end performance part (people who bought 2600k and 3770k) won't care that it costs $368 vs $334. But people who generally have an ax to grind with Intel, regardless the price/performance, will make mountains out of molehills over this.

BTRY B 529th FA BN · Mar 25, 2013

Now taking Micro Center order requests - lol

mikk · Mar 25, 2013

Olikan said:
good news everyone....haswell is pricier than ivy :hmm:

http://www.fudzilla.com/home/item/30864-haswell-pricing-revealed-by-us-eu-retailers

Pre-Sales are usually higher, nothing special.

Arachnotronic · Mar 25, 2013

Idontcare said:
To the victor go the spoils.

My predicition - people who wanted the top-end performance part (people who bought 2600k and 3770k) won't care that it costs $368 vs $334. But people who generally have an ax to grind with Intel, regardless the price/performance, will make mountains out of molehills over this.

If you make a big deal over $34 for a high end CPU, then you probably can't afford it in the first place.

inf64 · Mar 25, 2013

IIRC 3770K had similar price at launch so this is no surprise.

Olikan · Mar 25, 2013

inf64 said:
IIRC 3770K had similar price at launch so this is no surprise.

the more you know :whiste:

guskline · Mar 25, 2013

Idontcare said:
To the victor go the spoils.

My predicition - people who wanted the top-end performance part (people who bought 2600k and 3770k) won't care that it costs $368 vs $334. But people who generally have an ax to grind with Intel, regardless the price/performance, will make mountains out of molehills over this.

Agree 100%. Lordy, if you are complaining that a brand new Haswell 4770k costs @$368 you have no business even buying one.

Travel to MicroCenter like I did and get a retail 3770k for $229.99 plus tax! (The complainers will still say that is too high!):biggrin:

Pheesh · Mar 25, 2013

Even if Intel debuts the i7-4770K at the same MSRP as the i7-3770K the retailers aren't going to price it as such, especially for pre-sales. Intel can set the same price from their side but end retailers are going to raise their price for the first units- Supply and demand! Also I'm wondering if the initial low volumes due to the chipset bug will result in some higher debut prices.

BallaTheFeared · Mar 25, 2013

Any overclocking leaks yet?

MisterMac · Mar 25, 2013

BallaTheFeared said:
Any overclocking leaks yet?

how about some leaks in the first place :C

A question for IDC:

Would a AVX2 patch for Windows cause considerable speed up in the general overheard that is the OS and DirectX and so on?

In short, would the massive increase AVX2 gives on certain workloads by instructions\wider ports - be felt on a general basis if implemented in the OS?

inf64 · Mar 25, 2013

AVX2 needs recompiled apps in order to give you any benefits. There is no magic inside Haswell nor OS with a patch that might make use of AVX2 without the code being compiled with AVX2 in mind.

NTMBK · Mar 25, 2013

inf64 said:
AVX2 needs recompiled apps in order to give you any benefits. There is no magic inside Haswell nor OS with a patch that might make use of AVX2 without the code being compiled with AVX2 in mind.

This is, of course, one of the good things about Linux. Just recompile the whole damn thing with highly tuned flags to optimize for your specific hardware.

ShintaiDK · Mar 25, 2013

MisterMac said:
how about some leaks in the first place :C

A question for IDC:

Would a AVX2 patch for Windows cause considerable speed up in the general overheard that is the OS and DirectX and so on?

In short, would the massive increase AVX2 gives on certain workloads by instructions\wider ports - be felt on a general basis if implemented in the OS?

Windows already supports AVX2. But as said by inf64, every app etc needs to be recompiled. And thats not gonna happen. Windows Blue might have AVX2 supporting apps. Just as it got extended support for Haswell SOix states.

ShintaiDK · Mar 25, 2013

Fudzilla is wrong (as always).

You can already find the 4670K for example to 11$ less than used as the price by Fud.

inf64 · Mar 25, 2013

Fudzilla needs clicks

. That's why they are spamming/making news every 10 minutes.

MisterMac · Mar 25, 2013

ShintaiDK said:
Windows already supports AVX2. But as said by inf64, every app etc needs to be recompiled. And thats not gonna happen. Windows Blue might have AVX2 supporting apps. Just as it got extended support for Haswell SOix states.

Regardless that was my question.
For the internal code each APP obviously needs to be compiled with AVX2 flags in mind.

...but the general overhead of the OS\Driver layer does play in effect.
(Hello GPU's magicly increasing performance by drivers?)

Was just wondering - if AVX2 might cause some root execution of kernelcode\comdrivers better in a way that youd feel or syntheticly see greater performance with AVX2\Haswell in a IPC alike case on Windows.

(no just AVX2 - but the extra port as well, Haswell in a nutshell).

Avalon · Mar 25, 2013

Results don't look particularly impressive but I'm going to get one because CPUs are my crack, and GPUs are my cocaine.

[THG]Core i7-4770K: Haswell's Performance-Previewed

Diamond Member

Member

Lifer

Diamond Member

Member

Lifer

Diamond Member

Platinum Member

Elite Member

Lifer

Diamond Member

Lifer

Diamond Member

Platinum Member

Diamond Member

Member

Diamond Member

Senior member

Diamond Member

Lifer

Lifer

Lifer

Diamond Member

Senior member

Diamond Member