Intel "Haswell" Speculation thread

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

cbn

Lifer
Mar 27, 2009
12,968
221
106
15W is a lot. Hopefully, there will be more options. Anybody got a list of the whole line-up? Sorry, if this has already been mentioned.

I thought at one point there was talk of a 10 watt version?

P.S. Does anyone know if Haswell's TDP also includes the PCH? (Since the ultra book version is a one chip solution with two dies on package). If so, how much TDP does a PCH normally contribute overall?
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
231
106
One of my ex-laptops had a U1500 processor which according to Wiki, is only 5.5w.

Hence, I am puzzled a bit.

EDIT:
Could be total SoC consumption though.
 
Last edited:

TuxDave

Lifer
Oct 8, 2002
10,572
3
71
One of my ex-laptops had a U1500 processor which according to Wiki, is only 5.5w.

Hence, I am puzzled a bit.

EDIT:
Could be total SoC consumption though.

It probably is and so you have to account for all the stuff we jammed in there.

The main thing to remember is that TDP power is one thing. Idle/standby power is another.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
231
106
Features carried over from Ivy Bridge:
Haswell is confirmed to have:
Haswell is expected to have:[2]
  • A new cache design.
  • New advanced power-saving system.
  • Up to 2~6 cores available in consumer market and 8 core in server version.
  • 1MB L2 cache per core and up to a 32MB L3 cache for the Extreme Edition and Xeon.[6]
  • New sockets - LGA 1150 for desktops and rPGA947 & BGA1364 for the mobile market.[7]
  • Fully integrated voltage regulator, thereby moving another component from the motherboard and onto the CPU.[8]
  • Thunderbolt technology.[9]
  • Support for hardware-based transactional memory.[10]
  • 15W TDP processors for the Ultrabook platform (multi-chip package like Westmere).[11]
  • 37, 47, 57W TDP for mobile processors.[12]
  • 35, 45, 65, 95W TDP for desktop processors.[12]
  • Integrated GPU up to 20 EUs.[13]
Desktop SKUs back to 95W. Interesting, must be packing quite a punch :biggrin:
 

cbn

Lifer
Mar 27, 2009
12,968
221
106
15W is a lot. Hopefully, there will be more options. Anybody got a list of the whole line-up? Sorry, if this has already been mentioned.

Looking back on post #36 I noticed the 15 watt UltraBook Haswell SOC does include GT3 graphics.

Therefore there is hope average power consumption (as well as TDP) will be lower than we think for future Tablet variants (at 1x nm process technology) if some of those graphics units can be diasabled.

In fact, I think it has mentioned before that Intel is working on "configurable TDPs" for these devices when they are docked (and have access to extra cooling).
 

gevorg

Diamond Member
Nov 3, 2004
5,075
1
0
15W for a hyperthreaded dual core and a decently performing IGP is actually pretty good. Especially if it can do near 1W idle states.

The 57W mobile and 95W desktop are hopefully only for a small selection of niche CPUs... like Anandtech users. :)
 

XX55XX

Member
Mar 1, 2010
177
0
0
Haswell/Broadwell could be a nice upgrade from my Yorkfield. But, alas, one or two years down the road, of course.
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
Haswell is all about AVX2. Its key feature is 'gather' support, which in some cases enables an eightfold increase in performance.

Gather is the parallel version of a memory load operation, reading up to 8 different 32-bit memory locations simultaneously or nearly simultaneously. In fact it replaces 18 legacy instructions with just a single instruction!

This is very significant because previously vector instructions were very hard to use by compilers. Only having sequential access to memory more often than not made compilers stick to slow scalar code. AVX2 finally allows to auto-vectorize code loops a lot more effectively. In fact gather support is the key to the GPU's high performance in 'throughput computing'. So Intel is effectively enabling the CPU to sustain performance levels similar to those of a GPU of equivalent size. This is further reinforced by AVX2's support for fused multiply-add (FMA) instructions. Again this is a feature borrowed from GPUs.

Because AVX2 is a massive extension, we should not expect any other major changes to the architecture, unless those to better support AVX2. For instance it will demand higher cache bandwidth to sustain the high throughput, so it is expected to be doubled compared to Ivy Bridge. The only realistic IPC improvement would be an extension of the macro-op fusion capabilities to support non-destructive scalar operations.

And because a quad-core Haswell chip will have higher floating-point performance on the CPU side than on the GPU side, you can expect to see a return of software vertex processing to enhance the graphics performance (much like the Cell high-throughput CPU assists the GPU in a PlayStation 3). This is also reinforced by the addition of 16-bit floating-point support for Ivy Bridge (called F16C).
 

Dadofamunky

Platinum Member
Jan 4, 2005
2,184
0
0
6-8 physical cores
Higher clockspeed
Better IPC
Higher overclock potential
HT on all CPUs
Better IGP capable of crysis on medium/high
All CPU's unlocked
Lower load & idle power consumption

All for the low low price of £140!

...and a pony :twisted: :D
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
But what Iam interested about is the integration of GPGPU into mainstream apps. Could that iGPU be used for gp computing, Intel did announce Haswell would have multiple GPU cores.
No, integrated GPUs are far too weak for GPGPU purposes. The problem is that anything not closely resembling graphics runs at a low efficiency on the GPU. There's the communication overhead between the CPU and GPU, there's Amdahl's Law, there's register pressure, there's cache contention, limited RAM bandwidth, etc. all of which have prevented GPGPU from going mainstream.

But the solution is AVX2. It doubles the integer and floating-point vector performance, brings us gather support for parallel memory accesses, and all of this is unified into the CPU cores. So the heterogeneous communication overhead is eliminated, out-of-order execution allows to achieve higher instruction-level parallelism so less data-level parallelism is required and hence it suffers less from Amdahl's Law, and it also handles register pressure more elegantly and there's better cache hit rates so it requires less RAM bandwidth.

GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.
 
Last edited:

denev2004

Member
Dec 3, 2011
105
1
0
Haswell is all about AVX2. Its key feature is 'gather' support, which in some cases enables an eightfold increase in performance.

Gather is the parallel version of a memory load operation, reading up to 8 different 32-bit memory locations simultaneously or nearly simultaneously. In fact it replaces 18 legacy instructions with just a single instruction!

This is very significant because previously vector instructions were very hard to use by compilers. Only having sequential access to memory more often than not made compilers stick to slow scalar code. AVX2 finally allows to auto-vectorize code loops a lot more effectively. In fact gather support is the key to the GPU's high performance in 'throughput computing'. So Intel is effectively enabling the CPU to sustain performance levels similar to those of a GPU of equivalent size. This is further reinforced by AVX2's support for fused multiply-add (FMA) instructions. Again this is a feature borrowed from GPUs.

Because AVX2 is a massive extension, we should not expect any other major changes to the architecture, unless those to better support AVX2. For instance it will demand higher cache bandwidth to sustain the high throughput, so it is expected to be doubled compared to Ivy Bridge. The only realistic IPC improvement would be an extension of the macro-op fusion capabilities to support non-destructive scalar operations.

And because a quad-core Haswell chip will have higher floating-point performance on the CPU side than on the GPU side, you can expect to see a return of software vertex processing to enhance the graphics performance (much like the Cell high-throughput CPU assists the GPU in a PlayStation 3). This is also reinforced by the addition of 16-bit floating-point support for Ivy Bridge (called F16C).
Oh really? IVY starts to support 16bit floating point?
Also I'd like to ask..Isn't the AVX Shuffle in Sandy Bridge is a parallel memory operation, which is a part of "AVX 1.0"?
 

denev2004

Member
Dec 3, 2011
105
1
0
No, integrated GPUs are far too weak for GPGPU purposes. The problem is that anything not closely resembling graphics runs at a low efficiency on the GPU. There's the communication overhead between the CPU and GPU, there's Amdahl's Law, there's register pressure, there's cache contention, limited RAM bandwidth, etc. all of which have prevented GPGPU from going mainstream.

But the solution is AVX2. It doubles the integer and floating-point vector performance, brings us gather support for parallel memory accesses, and all of this is unified into the CPU cores. So the heterogeneous communication overhead is eliminated, out-of-order execution allows to achieve higher instruction-level parallelism so less data-level parallelism is required and hence it suffers less from Amdahl's Law, and it also handles register pressure more elegantly and there's better cache hit rates so it requires less RAM bandwidth.

GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.
Actually...I saw thread on Beyond3D arguing about wether Intel need to use AVX2 to double the theoretical throughput, which, mostly concern about wether they still need two 256 bit AVX unit. One thing about it is even Knights Corner has only one 512bit LNI unit.
 

dealcorn

Senior member
May 28, 2011
247
4
76
GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

As a trivial refinement, GPGPU is dead on the desktop. In the hpc space, MIC still must evidence whether it is the best solution for all use scenarios..
 

dealcorn

Senior member
May 28, 2011
247
4
76
Also i hope the idiot that though of sandwitching the ram between the CPU got fired and haswell doesnt go this model. I really DONT like sandy-E's layout... actually i hate it with a passion.

You have a valid concern that affects CPU cooling, but you confused people when you said it backwards. Your concern is that you do not want the CPU located between sticks of ram because it interferes with your CPU cooling solution. The person who laid out the motherboard traces likely was targeting a more mainstream customer. I suspect you need an attitude adjustment if you believe that every person who designs a product suitable for someone else merits termination.
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
Oh really? IVY starts to support 16bit floating point?
Yes, according to the AVX software programming interface, and confirmed in the Intel forums, Ivy Bridge adds:

• Two instructions to support 16-bit floating-point data type conversion to and from single-precision floating-point type. Conversion to packed 16-bit floating-point values from packed single-precision floating-point values also provides rounding control using an immediate byte. These float-16 instructions convert packed data types of different sizes following the same manner as the 256-bit vector SIMD extension, AVX.
• One instruction that generates random numbers of 16/32/64 bit wide random
integers. The random number generator instruction operates on general-purpose registers.
• Four instructions that allow software working in 64-bit environment to read and write FS base and GS base registers in all privileged levels.

Only in case a bug is detected they might pull out support for these instructions, which is not likely to happen.
Also I'd like to ask..Isn't the AVX Shuffle in Sandy Bridge is a parallel memory operation, which is a part of "AVX 1.0"?
No, none of the AVX1 shuffle instructions allow access to non-sequential memory locations.
 

CPUarchitect

Senior member
Jun 7, 2011
223
0
0
Actually...I saw thread on Beyond3D arguing about wether Intel need to use AVX2 to double the theoretical throughput, which, mostly concern about wether they still need two 256 bit AVX unit.
Sandy Bridge has a 256-bit multiplier (MUL) and 256-bit adder (ADD) unit. If Haswell had only one 256-bit fused multiply-add (FMA) unit, it would severely hurt floating-point performance. Legacy software doesn't use the FMA instructions, so only one MUL or ADD can be executed each clock cycle, instead of both simultaneously. And even software which does use FMA won't be able to achieve the same performance as Sandy Bridge because FMA requires a dependent MUL and ADD, which isn't always the case. Also given that gather is all about throughput computing it wouldn't make sense to cripple performance in the execution units. It would also fly against the goal of achieving higher performance/Watt.

Note also that Bulldozer features two 128-bit FMA units per module, on a 32 nm process. So it won't be an issue for Intel to equip Haswell with two 256-bit FMA units on 22 nm. The 256-bit paths are already there in Sandy Bridge.

So there should be no doubt that Haswell will feature two 256-bit FMA units, thereby doubling the peak throughput.
One thing about it is even Knights Corner has only one 512bit LNI unit.
Knight's Corner is an in-order architecture aimed at the HPC market. It doesn't compete against desktop CPUs and it doesn't (have to) support legacy applications that make use of SSE. So its design is not an indication that Haswell would only have a single FMA unit.
 

jhu

Lifer
Oct 10, 1999
11,918
9
81
No, integrated GPUs are far too weak for GPGPU purposes. The problem is that anything not closely resembling graphics runs at a low efficiency on the GPU. There's the communication overhead between the CPU and GPU, there's Amdahl's Law, there's register pressure, there's cache contention, limited RAM bandwidth, etc. all of which have prevented GPGPU from going mainstream.

But the solution is AVX2. It doubles the integer and floating-point vector performance, brings us gather support for parallel memory accesses, and all of this is unified into the CPU cores. So the heterogeneous communication overhead is eliminated, out-of-order execution allows to achieve higher instruction-level parallelism so less data-level parallelism is required and hence it suffers less from Amdahl's Law, and it also handles register pressure more elegantly and there's better cache hit rates so it requires less RAM bandwidth.

GPGPU is dead. We're getting a superior high-throughput CPU architecture instead.

Good post. Just wanted to point out that FMA has been around since at least 1996, and Intel has had an implementation since 2001 with Itanium.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,353
1,546
136
You confident Haswell will be able to compete GPUs in bitcoin like tasks?

No. Bitcoin is a special case -- there is practically no data requirements and there is no need for any control flow. This means it's all about peak integer output, which is something where the GPUs will still win. Haswell will be better than low-end GPUs in many other GPGPU loads which stress the memory side and which like more control flow.
 

Magic Carpet

Diamond Member
Oct 2, 2011
3,477
231
106
No. Bitcoin is a special case -- there is practically no data requirements and there is no need for any control flow. This means it's all about peak integer output, which is something where the GPUs will still win. Haswell will be better than low-end GPUs in many other GPGPU loads which stress the memory side and which like more control flow.
Bitcoin was just one example. I am pretty sure AMD's Next Generation Core will push the boundaries even further. Until at least Haswell shows up, it is prematurely to write off GPGPU.

What's with the huge increase in L2 cache size? Is Intel maybe planning to nix the L3 cache on lower end SKUs? Or is the bigger L2 necessary for AVX2?
I am not competent enough to answer these questions but I am sure as hell, there are experts in this thread able to address this.