Intel "Haswell" Speculation thread

cbn · Jan 9, 2012

Magic Carpet said:
15W is a lot. Hopefully, there will be more options. Anybody got a list of the whole line-up? Sorry, if this has already been mentioned.

I thought at one point there was talk of a 10 watt version?

P.S. Does anyone know if Haswell's TDP also includes the PCH? (Since the ultra book version is a one chip solution with two dies on package). If so, how much TDP does a PCH normally contribute overall?

Magic Carpet · Jan 9, 2012

One of my ex-laptops had a U1500 processor which according to Wiki, is only 5.5w.

Hence, I am puzzled a bit.

EDIT:
Could be total SoC consumption though.

TuxDave · Jan 9, 2012

Magic Carpet said:
One of my ex-laptops had a U1500 processor which according to Wiki, is only 5.5w.

Hence, I am puzzled a bit.

EDIT:
Could be total SoC consumption though.

It probably is and so you have to account for all the stuff we jammed in there.

The main thing to remember is that TDP power is one thing. Idle/standby power is another.

Magic Carpet · Jan 9, 2012

Features carried over from Ivy Bridge:

A 22 nm manufacturing process.

3D tri-gate transistors (for Ivy Bridge processors and onwards).

A 14 stage pipeline (since the Core microarchitecture).

Haswell is confirmed to have:

Advanced Vector Extensions 2 (AVX2) instruction set (or Haswell New Instructions), including gather, bit manipulation, Floating Point Multiply Accumulate, and FMA3 support.[4]

Direct3D 11.1 and OpenGL 3.2 graphics unit.[5]

Haswell is expected to have:[2]

A new cache design.

New advanced power-saving system.

Up to 2~6 cores available in consumer market and 8 core in server version.

1MB L2 cache per core and up to a 32MB L3 cache for the Extreme Edition and Xeon.[6]

New sockets - LGA 1150 for desktops and rPGA947 & BGA1364 for the mobile market.[7]

Fully integrated voltage regulator, thereby moving another component from the motherboard and onto the CPU.[8]

Thunderbolt technology.[9]

Support for hardware-based transactional memory.[10]

15W TDP processors for the Ultrabook platform (multi-chip package like Westmere).[11]

37, 47, 57W TDP for mobile processors.[12]

35, 45, 65, 95W TDP for desktop processors.[12]

Integrated GPU up to 20 EUs.[13]

Desktop SKUs back to 95W. Interesting, must be packing quite a punch :biggrin:

cbn · Jan 9, 2012

Magic Carpet said:
15W is a lot. Hopefully, there will be more options. Anybody got a list of the whole line-up? Sorry, if this has already been mentioned.

Looking back on post #36 I noticed the 15 watt UltraBook Haswell SOC does include GT3 graphics.

Therefore there is hope average power consumption (as well as TDP) will be lower than we think for future Tablet variants (at 1x nm process technology) if some of those graphics units can be diasabled.

In fact, I think it has mentioned before that Intel is working on "configurable TDPs" for these devices when they are docked (and have access to extra cooling).

gevorg · Jan 9, 2012

15W for a hyperthreaded dual core and a decently performing IGP is actually pretty good. Especially if it can do near 1W idle states.

The 57W mobile and 95W desktop are hopefully only for a small selection of niche CPUs... like Anandtech users. 🙂

XX55XX · Jan 9, 2012

Haswell/Broadwell could be a nice upgrade from my Yorkfield. But, alas, one or two years down the road, of course.

denev2004 · Jan 9, 2012

Magic Carpet said:
Desktop SKUs back to 95W. Interesting, must be packing quite a punch :biggrin:

Wait, Is Westmere MCM? Does that refer to Clarkdale?

http://forums.anandtech.com/wiki/Thunderbolt_%28interface)and..what is the thunderbolt technology?

TuxDave · Jan 9, 2012

denev2004 said:
Wait, Is Westmere MCM? Does that refer to Clarkdale?

http://forums.anandtech.com/wiki/Thunderbolt_(interface)and..what is the thunderbolt technology?

http://www.apple.com/thunderbolt/

CPUarchitect · Jan 10, 2012

Haswell is all about AVX2. Its key feature is 'gather' support, which in some cases enables an eightfold increase in performance.

Gather is the parallel version of a memory load operation, reading up to 8 different 32-bit memory locations simultaneously or nearly simultaneously. In fact it replaces 18 legacy instructions with just a single instruction!

This is very significant because previously vector instructions were very hard to use by compilers. Only having sequential access to memory more often than not made compilers stick to slow scalar code. AVX2 finally allows to auto-vectorize code loops a lot more effectively. In fact gather support is the key to the GPU's high performance in 'throughput computing'. So Intel is effectively enabling the CPU to sustain performance levels similar to those of a GPU of equivalent size. This is further reinforced by AVX2's support for fused multiply-add (FMA) instructions. Again this is a feature borrowed from GPUs.

Because AVX2 is a massive extension, we should not expect any other major changes to the architecture, unless those to better support AVX2. For instance it will demand higher cache bandwidth to sustain the high throughput, so it is expected to be doubled compared to Ivy Bridge. The only realistic IPC improvement would be an extension of the macro-op fusion capabilities to support non-destructive scalar operations.

And because a quad-core Haswell chip will have higher floating-point performance on the CPU side than on the GPU side, you can expect to see a return of software vertex processing to enhance the graphics performance (much like the Cell high-throughput CPU assists the GPU in a PlayStation 3). This is also reinforced by the addition of 16-bit floating-point support for Ivy Bridge (called F16C).

BenchPress · Jan 10, 2012

BallaTheFeared said:
Eww, HT... Do not want.

Why not? Sandy Bridge consistely offers 30% higher performance with Hyper-Threading. It has come a long way since the Pentium 4 HT days.

Dadofamunky · Jan 10, 2012

Maximilian said:
6-8 physical cores
Higher clockspeed
Better IPC
Higher overclock potential
HT on all CPUs
Better IGP capable of crysis on medium/high
All CPU's unlocked
Lower load & idle power consumption

All for the low low price of £140!

...and a pony :twisted: 😀

CPUarchitect · Jan 10, 2012

lOl_lol_lOl said:
But what Iam interested about is the integration of GPGPU into mainstream apps. Could that iGPU be used for gp computing, Intel did announce Haswell would have multiple GPU cores.

No, integrated GPUs are far too weak for GPGPU purposes. The problem is that anything not closely resembling graphics runs at a low efficiency on the GPU. There's the communication overhead between the CPU and GPU, there's Amdahl's Law, there's register pressure, there's cache contention, limited RAM bandwidth, etc. all of which have prevented GPGPU from going mainstream.

But the solution is AVX2. It doubles the integer and floating-point vector performance, brings us gather support for parallel memory accesses, and all of this is unified into the CPU cores. So the heterogeneous communication overhead is eliminated, out-of-order execution allows to achieve higher instruction-level parallelism so less data-level parallelism is required and hence it suffers less from Amdahl's Law, and it also handles register pressure more elegantly and there's better cache hit rates so it requires less RAM bandwidth.

GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

denev2004 · Jan 10, 2012

CPUarchitect said:
Haswell is all about AVX2. Its key feature is 'gather' support, which in some cases enables an eightfold increase in performance.

Gather is the parallel version of a memory load operation, reading up to 8 different 32-bit memory locations simultaneously or nearly simultaneously. In fact it replaces 18 legacy instructions with just a single instruction!

This is very significant because previously vector instructions were very hard to use by compilers. Only having sequential access to memory more often than not made compilers stick to slow scalar code. AVX2 finally allows to auto-vectorize code loops a lot more effectively. In fact gather support is the key to the GPU's high performance in 'throughput computing'. So Intel is effectively enabling the CPU to sustain performance levels similar to those of a GPU of equivalent size. This is further reinforced by AVX2's support for fused multiply-add (FMA) instructions. Again this is a feature borrowed from GPUs.

Because AVX2 is a massive extension, we should not expect any other major changes to the architecture, unless those to better support AVX2. For instance it will demand higher cache bandwidth to sustain the high throughput, so it is expected to be doubled compared to Ivy Bridge. The only realistic IPC improvement would be an extension of the macro-op fusion capabilities to support non-destructive scalar operations.

And because a quad-core Haswell chip will have higher floating-point performance on the CPU side than on the GPU side, you can expect to see a return of software vertex processing to enhance the graphics performance (much like the Cell high-throughput CPU assists the GPU in a PlayStation 3). This is also reinforced by the addition of 16-bit floating-point support for Ivy Bridge (called F16C).

Oh really? IVY starts to support 16bit floating point?
Also I'd like to ask..Isn't the AVX Shuffle in Sandy Bridge is a parallel memory operation, which is a part of "AVX 1.0"?

denev2004 · Jan 10, 2012

CPUarchitect said:
No, integrated GPUs are far too weak for GPGPU purposes. The problem is that anything not closely resembling graphics runs at a low efficiency on the GPU. There's the communication overhead between the CPU and GPU, there's Amdahl's Law, there's register pressure, there's cache contention, limited RAM bandwidth, etc. all of which have prevented GPGPU from going mainstream.

But the solution is AVX2. It doubles the integer and floating-point vector performance, brings us gather support for parallel memory accesses, and all of this is unified into the CPU cores. So the heterogeneous communication overhead is eliminated, out-of-order execution allows to achieve higher instruction-level parallelism so less data-level parallelism is required and hence it suffers less from Amdahl's Law, and it also handles register pressure more elegantly and there's better cache hit rates so it requires less RAM bandwidth.

GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

Actually...I saw thread on Beyond3D arguing about wether Intel need to use AVX2 to double the theoretical throughput, which, mostly concern about wether they still need two 256 bit AVX unit. One thing about it is even Knights Corner has only one 512bit LNI unit.

dealcorn · Jan 10, 2012

CPUarchitect said:
GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

As a trivial refinement, GPGPU is dead on the desktop. In the hpc space, MIC still must evidence whether it is the best solution for all use scenarios..

dealcorn · Jan 10, 2012

aigomorla said:
Also i hope the idiot that though of sandwitching the ram between the CPU got fired and haswell doesnt go this model. I really DONT like sandy-E's layout... actually i hate it with a passion.

You have a valid concern that affects CPU cooling, but you confused people when you said it backwards. Your concern is that you do not want the CPU located between sticks of ram because it interferes with your CPU cooling solution. The person who laid out the motherboard traces likely was targeting a more mainstream customer. I suspect you need an attitude adjustment if you believe that every person who designs a product suitable for someone else merits termination.

CPUarchitect · Jan 10, 2012

denev2004 said:
Oh really? IVY starts to support 16bit floating point?

Yes, according to the AVX software programming interface, and confirmed in the Intel forums, Ivy Bridge adds:

Two instructions to support 16-bit floating-point data type conversion to and from single-precision floating-point type. Conversion to packed 16-bit floating-point values from packed single-precision floating-point values also provides rounding control using an immediate byte. These float-16 instructions convert packed data types of different sizes following the same manner as the 256-bit vector SIMD extension, AVX.
One instruction that generates random numbers of 16/32/64 bit wide random
integers. The random number generator instruction operates on general-purpose registers.
Four instructions that allow software working in 64-bit environment to read and write FS base and GS base registers in all privileged levels.

Only in case a bug is detected they might pull out support for these instructions, which is not likely to happen.

Also I'd like to ask..Isn't the AVX Shuffle in Sandy Bridge is a parallel memory operation, which is a part of "AVX 1.0"?

No, none of the AVX1 shuffle instructions allow access to non-sequential memory locations.

Magic Carpet · Jan 10, 2012

CPUarchitect said:
GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

You confident Haswell will be able to compete GPUs in bitcoin like tasks?

Obsoleet · Jan 10, 2012

XX55XX said:
Haswell/Broadwell could be a nice upgrade from my Yorkfield. But, alas, one or two years down the road, of course.

Same here. Yorkfield until Haswell.

CPUarchitect · Jan 10, 2012

denev2004 said:
Actually...I saw thread on Beyond3D arguing about wether Intel need to use AVX2 to double the theoretical throughput, which, mostly concern about wether they still need two 256 bit AVX unit.

Sandy Bridge has a 256-bit multiplier (MUL) and 256-bit adder (ADD) unit. If Haswell had only one 256-bit fused multiply-add (FMA) unit, it would severely hurt floating-point performance. Legacy software doesn't use the FMA instructions, so only one MUL or ADD can be executed each clock cycle, instead of both simultaneously. And even software which does use FMA won't be able to achieve the same performance as Sandy Bridge because FMA requires a dependent MUL and ADD, which isn't always the case. Also given that gather is all about throughput computing it wouldn't make sense to cripple performance in the execution units. It would also fly against the goal of achieving higher performance/Watt.

Note also that Bulldozer features two 128-bit FMA units per module, on a 32 nm process. So it won't be an issue for Intel to equip Haswell with two 256-bit FMA units on 22 nm. The 256-bit paths are already there in Sandy Bridge.

So there should be no doubt that Haswell will feature two 256-bit FMA units, thereby doubling the peak throughput.

One thing about it is even Knights Corner has only one 512bit LNI unit.

Knight's Corner is an in-order architecture aimed at the HPC market. It doesn't compete against desktop CPUs and it doesn't (have to) support legacy applications that make use of SSE. So its design is not an indication that Haswell would only have a single FMA unit.

jhu · Jan 10, 2012

CPUarchitect said:
No, integrated GPUs are far too weak for GPGPU purposes. The problem is that anything not closely resembling graphics runs at a low efficiency on the GPU. There's the communication overhead between the CPU and GPU, there's Amdahl's Law, there's register pressure, there's cache contention, limited RAM bandwidth, etc. all of which have prevented GPGPU from going mainstream.

But the solution is AVX2. It doubles the integer and floating-point vector performance, brings us gather support for parallel memory accesses, and all of this is unified into the CPU cores. So the heterogeneous communication overhead is eliminated, out-of-order execution allows to achieve higher instruction-level parallelism so less data-level parallelism is required and hence it suffers less from Amdahl's Law, and it also handles register pressure more elegantly and there's better cache hit rates so it requires less RAM bandwidth.

GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

Good post. Just wanted to point out that FMA has been around since at least 1996, and Intel has had an implementation since 2001 with Itanium.

frostedflakes · Jan 10, 2012

Magic Carpet said:
Desktop SKUs back to 95W. Interesting, must be packing quite a punch :biggrin:

What's with the huge increase in L2 cache size? Is Intel maybe planning to nix the L3 cache on lower end SKUs? Or is the bigger L2 necessary for AVX2?

Tuna-Fish · Jan 10, 2012

Magic Carpet said:
You confident Haswell will be able to compete GPUs in bitcoin like tasks?

No. Bitcoin is a special case -- there is practically no data requirements and there is no need for any control flow. This means it's all about peak integer output, which is something where the GPUs will still win. Haswell will be better than low-end GPUs in many other GPGPU loads which stress the memory side and which like more control flow.

Magic Carpet · Jan 10, 2012

Tuna-Fish said:
No. Bitcoin is a special case -- there is practically no data requirements and there is no need for any control flow. This means it's all about peak integer output, which is something where the GPUs will still win. Haswell will be better than low-end GPUs in many other GPGPU loads which stress the memory side and which like more control flow.

Bitcoin was just one example. I am pretty sure AMD's Next Generation Core will push the boundaries even further. Until at least Haswell shows up, it is prematurely to write off GPGPU.

frostedflakes said:
What's with the huge increase in L2 cache size? Is Intel maybe planning to nix the L3 cache on lower end SKUs? Or is the bigger L2 necessary for AVX2?

I am not competent enough to answer these questions but I am sure as hell, there are experts in this thread able to address this.

Intel "Haswell" Speculation thread

Lifer

Diamond Member

Lifer

Diamond Member

Lifer

Diamond Member

Member

Member

Lifer

Senior member

Senior member

Platinum Member

Senior member

Member

Member

Senior member

Senior member

Senior member

Diamond Member

Platinum Member

Senior member

Lifer

Diamond Member

Golden Member

Diamond Member