[THG]Core i7-4770K: Haswell's Performance-Previewed

Page 12 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

itsmydamnation

Platinum Member
Feb 6, 2011
2,804
3,266
136
I don't suppose you have links to those Beyond3D articles do you? I'd like to read them to see what they suggest, but their search function is BeyondCrap D:


they are berried deep within the console forum before neogaf noobs decided to run a muck. but it had your normal sort of people answering, reppi, grall etc
 

Pilum

Member
Aug 27, 2012
182
3
81
256bit is, thats 8 32bit floats. There are already lots of 128bit INT SIMD ops in SSE. wider vectors is exactly the same as "MOAR CORES!?!?!" you still need data to operate on. Some tasks that will be much easier then others, there are some good topics about 256bit vectors for games on beyond3d. Short story is 128bit SIMD is easy, 256bit is possible but generally requires a rethink in the way you write code and it tends to clash with object orientated coding.
If you could be a little more specific, this might actually be believable. Generally, for vectorizable code, there isn't a "wall" of vector lane count beyond which vectorization becomes hard. If you can do SIMD, you can usually do it in 4, 8, 16 or more lanes.

And SIMD and objected-oriented code will always clash, anyway. OO is a lot of pointer chasing, and SIMD is completely useless there.

However, for physics calculation, higher lane counts may be a problem. You'll usually use 4-way SIMD for doing 3D vector calculations (using only three lanes, ignoring the fourth). So it won't be trivial to use 8-way SIMD on data structure not designed for it. However, this won't be a problem for all engines, as some engines will have been written by non-stupid people who could see that we'd get wider SIMD execution over time. And changing the engine code to accomodate for more flexible SIMD lane width will pay off in the long run, as we'll get wider and wider SIMD engines. Intels Knights family already uses 16 lanes, and AVX is designed to be easily extendable, up to 1024bits/32 lanes IIRC.

However, it'll take a long time until AVX is widely used, simply because you can't depend on it being there. There are still so many C2D, K10, NHM, etc. systems in the market that it doesn't pay off to support AVX. Once you reach more than 50% market saturation that becomes attractive, but we're still a long way off from reaching this point - especially considering the longer PC replacement cycles we have these days.
 

NTMBK

Lifer
Nov 14, 2011
10,248
5,045
136
If you could be a little more specific, this might actually be believable. Generally, for vectorizable code, there isn't a "wall" of vector lane count beyond which vectorization becomes hard. If you can do SIMD, you can usually do it in 4, 8, 16 or more lanes.

And SIMD and objected-oriented code will always clash, anyway. OO is a lot of pointer chasing, and SIMD is completely useless there.

However, for physics calculation, higher lane counts may be a problem. You'll usually use 4-way SIMD for doing 3D vector calculations (using only three lanes, ignoring the fourth). So it won't be trivial to use 8-way SIMD on data structure not designed for it. However, this won't be a problem for all engines, as some engines will have been written by non-stupid people who could see that we'd get wider SIMD execution over time. And changing the engine code to accomodate for more flexible SIMD lane width will pay off in the long run, as we'll get wider and wider SIMD engines. Intels Knights family already uses 16 lanes, and AVX is designed to be easily extendable, up to 1024bits/32 lanes IIRC.

However, it'll take a long time until AVX is widely used, simply because you can't depend on it being there. There are still so many C2D, K10, NHM, etc. systems in the market that it doesn't pay off to support AVX. Once you reach more than 50% market saturation that becomes attractive, but we're still a long way off from reaching this point - especially considering the longer PC replacement cycles we have these days.

It's not just a case of "non-stupid people". Supporting the kind of operation bundling you suggest makes the code a lot more complex, and dealing with branching logic is very troublesome. It gets a bit better with AVX because certain operations can be masked, but it's still not ideal.

The 128 bit gather added by AVX2 should also come in handy for existing 128bit wide operations.
 

inf64

Diamond Member
Mar 11, 2011
3,706
4,050
136
On the upgrade to Haswell topic. The way I see it we have 2 groups(generalized) of users that would see huge benefits from upgrading to 4C/8T Haswell and 2 groups of users that would see smaller benefits.

Those who would see massive performance jump (from highest to lowest):
1)Users of pre-Conroe products;C2Q(OCed and stock,45,65nm) users;PhenomI&II X4(OCed and stock).
This group would see some amazing performance gains across the board. ST performance would jump up to 70% (or maybe even more), MT performance in some cases >2x.
2)Nehalem generation users(both stock and OCed);X6 users(both stock and OCed);FX81xx (OCed); SB i5 4C/4T (stock and slightly OCed)
Nehalem group of users would see some big gains in ST performance(~40+% over OCed i7 Vs OCed Haswell i7) and noticeable gains in MT performance(~25-30% is not impossible). Similar goes for X6/FX OCed users the only difference being they would see higher jump in ST performance(compared to Nehalem users) and maybe similar or a bit lower jump in MT performance.
i5 2500K (or any 4C/4T SB gen.) users would see smallish ST gains (~13% if their CPU is OCed) or a bit higher gains(~20% if their SB i5 is stock). In MT workloads they would see very big gains though,~45+% if their CPU is at stock and a bit lower( if they OC). This is all in legacy code.

Those who would see smaller benefits from Haswell upgrade.
1) FX83xx users (mostly OCed). This group of users might see some big gains in ST software and some pretty mediocre (if that) gains in MT software,a mixed bag. On one hand some older legacy apps that are not generally optimized for newer uarchitectures would run much better on Haswell and on the other hand,some newer apps that are optimized for newer uarchitectres would run simiarly on both (MTed+supporting advanced ISA). The only difference is that Haswell will still pull ahead in those AVX2 optimized MTed workloads when they show up in commercial software(by how much is unknown since AVX2 alone is not a guarantee of doubling of throughput due to other possible code limitations). Price difference plays a big role(FX advantage) and power consumption too(i7 advantage).
2) SB i7 (stock and OCed),IB i7 (stock and OCed). Both ST and MT performance increase in legacy software would be hard to warrant a complete platform change. SB owners might see a bit higher performance gains Vs IB users but overall these users would have to plan on using some specialized software that supports AVX2/FMA3 in order to harness the true potential of Haswell. This might take a longer time to happen if we take other ISA extensions and their adoption rate as examples in the history.
 
Last edited:

Pilum

Member
Aug 27, 2012
182
3
81
It's not just a case of "non-stupid people". Supporting the kind of operation bundling you suggest makes the code a lot more complex,
The problems of proper data organization for vector operations (AoS vs SoA) is a very old one. x86 SIMD ISA extensions were introduced in the late 90s, and it was soon clear that the execution width would rise over time (SSE1 in 1999). By the mid-00s it should have been obvious that you need to pay lots of attention to your whole data storage design if you want to keep your code future-safe. If you ignored that whole topic... well, stupid.

Of course if you work on a very old codebase, a proper data storage organization may be hard to implement. Not doing that would still be stupid, even if perfectly explainable due to a variety of reasons (insufficient manpower allocation, management interdiction, etc.).

and dealing with branching logic is very troublesome.
You generally don't do branches in vector code, you use predication.

It gets a bit better with AVX because certain operations can be masked, but it's still not ideal.
What specifically do you mean? AVX doesn't have a vector select register (in contrast to LNI). Traditional masking with result blending has been possible since MMX, it's just more elegant with AVX.

The 128 bit gather added by AVX2 should also come in handy for existing 128bit wide operations.
Gather should always come in handy, also for 256-bit. Actually it should allow high-performance operations even on AoS, as you can just load a 256-bit register from two structures with 128-bit data with one instruction.
 

NTMBK

Lifer
Nov 14, 2011
10,248
5,045
136
What specifically do you mean? AVX doesn't have a vector select register (in contrast to LNI). Traditional masking with result blending has been possible since MMX, it's just more elegant with AVX.

Blork, my mistake there. I've been too engrossed in Xeon Phi literature lately, I somehow managed to get that capability of it mixed up with AVX in my head. Doesn't help I was looking at the AVX2 masked gather stuff.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
231
106
IMO, the best way to boost haswell sales would be to release software that could make use of the new instructions. Anything on the horizon?
 

Idontcare

Elite Member
Oct 10, 1999
21,118
58
91
Mar 10, 2006
11,715
2,012
126
To the victor go the spoils.

My predicition - people who wanted the top-end performance part (people who bought 2600k and 3770k) won't care that it costs $368 vs $334. But people who generally have an ax to grind with Intel, regardless the price/performance, will make mountains out of molehills over this.

If you make a big deal over $34 for a high end CPU, then you probably can't afford it in the first place.
 

inf64

Diamond Member
Mar 11, 2011
3,706
4,050
136
IIRC 3770K had similar price at launch so this is no surprise.
 

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
To the victor go the spoils.

My predicition - people who wanted the top-end performance part (people who bought 2600k and 3770k) won't care that it costs $368 vs $334. But people who generally have an ax to grind with Intel, regardless the price/performance, will make mountains out of molehills over this.

Agree 100%. Lordy, if you are complaining that a brand new Haswell 4770k costs @$368 you have no business even buying one.

Travel to MicroCenter like I did and get a retail 3770k for $229.99 plus tax! (The complainers will still say that is too high!):biggrin:
 

Pheesh

Member
May 31, 2012
138
0
0
Even if Intel debuts the i7-4770K at the same MSRP as the i7-3770K the retailers aren't going to price it as such, especially for pre-sales. Intel can set the same price from their side but end retailers are going to raise their price for the first units- Supply and demand! Also I'm wondering if the initial low volumes due to the chipset bug will result in some higher debut prices.
 

MisterMac

Senior member
Sep 16, 2011
777
0
0
Any overclocking leaks yet?

how about some leaks in the first place :C



A question for IDC:

Would a AVX2 patch for Windows cause considerable speed up in the general overheard that is the OS and DirectX and so on?

In short, would the massive increase AVX2 gives on certain workloads by instructions\wider ports - be felt on a general basis if implemented in the OS?
 

inf64

Diamond Member
Mar 11, 2011
3,706
4,050
136
AVX2 needs recompiled apps in order to give you any benefits. There is no magic inside Haswell nor OS with a patch that might make use of AVX2 without the code being compiled with AVX2 in mind.
 

NTMBK

Lifer
Nov 14, 2011
10,248
5,045
136
AVX2 needs recompiled apps in order to give you any benefits. There is no magic inside Haswell nor OS with a patch that might make use of AVX2 without the code being compiled with AVX2 in mind.

This is, of course, one of the good things about Linux. Just recompile the whole damn thing with highly tuned flags to optimize for your specific hardware.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
how about some leaks in the first place :C



A question for IDC:

Would a AVX2 patch for Windows cause considerable speed up in the general overheard that is the OS and DirectX and so on?

In short, would the massive increase AVX2 gives on certain workloads by instructions\wider ports - be felt on a general basis if implemented in the OS?

Windows already supports AVX2. But as said by inf64, every app etc needs to be recompiled. And thats not gonna happen. Windows Blue might have AVX2 supporting apps. Just as it got extended support for Haswell SOix states.
 

ShintaiDK

Lifer
Apr 22, 2012
20,378
145
106
Fudzilla is wrong (as always).

You can already find the 4670K for example to 11$ less than used as the price by Fud.
 

inf64

Diamond Member
Mar 11, 2011
3,706
4,050
136
Fudzilla needs clicks ;). That's why they are spamming/making news every 10 minutes.
 

MisterMac

Senior member
Sep 16, 2011
777
0
0
Windows already supports AVX2. But as said by inf64, every app etc needs to be recompiled. And thats not gonna happen. Windows Blue might have AVX2 supporting apps. Just as it got extended support for Haswell SOix states.


Regardless that was my question.
For the internal code each APP obviously needs to be compiled with AVX2 flags in mind.


...but the general overhead of the OS\Driver layer does play in effect.
(Hello GPU's magicly increasing performance by drivers?)

Was just wondering - if AVX2 might cause some root execution of kernelcode\comdrivers better in a way that youd feel or syntheticly see greater performance with AVX2\Haswell in a IPC alike case on Windows.

(no just AVX2 - but the extra port as well, Haswell in a nutshell).
 

Avalon

Diamond Member
Jul 16, 2001
7,565
150
106
Results don't look particularly impressive but I'm going to get one because CPUs are my crack, and GPUs are my cocaine.