Haswell Core Count

Page 8 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

exar333

Diamond Member
Feb 7, 2004
8,518
8
91
Don't call my post silly when you give the most ridiculous argument ever. Hindsight is 20/20. AMD was betting on software to be heavily multi-threaded by now. And by the time they realized that was a mistake, they had already radically changed their micro-architecture. They have limited engineering resources and they had no choice but to finish this complex project. Note that it takes Intel two years between every tock, the project timelines overlap, and they make less radical changes.

Only after Bulldozer was finished, AMD could start concentrating on IPC again. What you're suggesting is plain stupid. You can't build a pickup truck and halfway through decide you want a sports car instead...

What IPC improvements are going into Haswell that you know of?

My post was more about the general potential AMD has for improving IPC, not so much about specific designs. That said, Haswell will be facing Steamroller, not Piledriver. And I'm not claiming the latter will be 30% faster, only that this would be within the realm of possible.

It's easier to catch up than improve an almost perfect design.

You obviously don't understand CPUs, so arguing is pointless. Enhancements to IPC have already been listed bere for Haswell. Probably something like 5-10% will be the norm (again). The point of Haswell is AVX2 integration and incorporating existing GPGPU-like functionality that will quite literally enable some applications to have HUGE gains.

BD is a broken truck and willl require MANY iterations to get right. By your logic, AMD could just give BD 30% more performance. Intel could do that too. Just 'ramp up' clocks and power usage be d@mned. Unfortunately we don't all live your fantasy world. :)
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
There was one AVX application since SB and BD was released, x264 HD. After than long and we still only have that one. I hope we will have more than one at Haswell released day but i doubt it.

As I, and many others, have already stated many times, AVX was just a 'framework' for AVX2. AVX by itsself is only useful to a very limited number of applications, but with AVX2, almost all applications will be able to take advantage of the performance increase.
 
Last edited:

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
I agree that more efficiency is actually a better improvement, but without extra cores intel is missing an opportunity to offer an even better product. Intel could have done both: more cores, AND better efficiency per core.

Better product for who? The people who just want MOAR CORES and do not care/understand any other aspect of CPUs? Sorry, but Haswell for mainstream will be a perfect fit for 90% of PC users. And for the other 10% we will have Haswell-E which will shine with 6, 8 and even 10 cores.
 

Lepton87

Platinum Member
Jul 28, 2009
2,544
9
81
AMD was not betting on that at all. AMD found the cheapest way they could market an "8 core" processor and hope people buy into the marketing BS. And to a certain extent it works. How many people actually know that BD's core count is more like 8 half cores which don't even equate to 4 full ones, they are quite literally 8 crippled cores. If intel couldn't pull off a paradigm shift with Itanium, AMD certainly isn't going to with BD. Nor do I think they really thought they would.

CMT certainly isn't the problem with BD. IF AMD doubled the number of cores and disabled CMT it would be true 8 cores but it would still be terribly disappointing. If would have the same disappointing single thread performance and gain 20% of performance with 5-8threaded workloads.
 

pcunite

Senior member
Nov 15, 2007
336
1
76
AMD on the other hand has a lot of opportunity for catching up. There's a lot that can be improved and a 30% increase in IPC really is within the realm of possible.

I think most of us are tired of them saying that ... we've moved on. I'll check in to AMD in ten years to see where they are.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
As I, and many others, have already stated many times, AVX was just a 'framework' for AVX2. AVX by itsself is only useful to a very limited number of applications, but with AVX2, almost all applications will be able to take advantage of the performance increase.

Im not debating if all applications are able to benefit from AVX2. Im saying that on the first day of Haswell's release, we will not have that many applications with AVX2 support.

You will not see any tangible performance increase in legacy software going from IB to Haswell (from the specs we know today). Today's multithreaded programs would gain a 30-50% instant performance increase with two more cores. A Quad core Haswell will have to wait for applications to be recompiled or written with AVX2 in order to gain more than 5-10% IPC and performance over the quad core IB.

A six core Haswell would get 30-50% more performance from day one ;)
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
Im not debating if all applications are able to benefit from AVX2. Im saying that on the first day of Haswell's release, we will not have that many applications with AVX2 support.

You will not see any tangible performance increase in legacy software going from IB to Haswell (from the specs we know today). Today's multithreaded programs would gain a 30-50% instant performance increase with two more cores. A Quad core Haswell will have to wait for applications to be recompiled or written with AVX2 in order to gain more than 5-10% IPC and performance over the quad core IB.

A six core Haswell would get 30-50% more performance from day one ;)

Lies, you didn't expect a SINGLE application to be ready for launchday of "haswell".
So we now have gone from using a fallacy...to dedicated lies :whiste:
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
AMD was not betting on that at all. AMD found the cheapest way they could market an "8 core" processor and hope people buy into the marketing BS.
You really think that out of all the directions AMD could have chosen, they picked the one that would fool people the most? Then why not go for a 20 GHz single-core? It's perfectly within reach when sacrificing lots of IPC, and by your logic it would sell like hotcakes to the ignorant.

Seriously now, real-world results matter. AMD made an honest engineering mistake by expecting core count to be the primary factor in CPU performance. Just look at how fast we went from dual-core to quad-core. We should be at 16 core for the mainstream market by now if that was a trend.

The reality is that adding more cores is a last resort. Besides sheer clock frequency, there's three types of parallelism to crank up the performance:
- Instruction Level Parallelism
- Data Level Parallelism
- Thread Level Parallelism

Developers love ILP, because it doesn't really take them any effort to benefit from it. Creating software is complex enough as it is. But ILP is limited and extracting more takes disproportional effort (read: power) from the hardware. DLP is the next best thing. It takes vector instuctions to extract it, and they come in two flavors: horizontal, and vertical. The horizontal kind has limited usability (literally vector math), and it takes some developer effort to get good results. The vertical kind is what a GPU uses, and is easy to develop for and widely applicable. This is what is finally coming to the CPU with Haswell's AVX2. And lastly we have TLP, which puts all of the responsibility for achieving parallelism into the hands of developers. Beyond basic asynchronous processing on a dual-core, it quickly becomes really hard to orchestrate many tasks on a multi-core CPU. Haswell's TSX is going to help a great deal automating that to some extent and lowering the overhead. This will take time though, and since transistor counts keep increasing exponentially it is a certainty that core count will go up in the future if there are no better alternatives left.

What you have to realize is that the type of applications that currently benefit from more than four cores, are the ones with data level parallelism, i.e. doing the same operations on different portions of data! So while you can also extract that DLP using multiple cores, it's more efficient and easier to develop for using DLP tools. That is, wide vertical vector instructions, like AVX2.

I'll take a Haswell quad-core over an Ivy Bridge 6-core any day.
 

AtenRa

Lifer
Feb 2, 2009
14,001
3,357
136
@Lonbjerg

We have a year ahead of us, lets pray will have enough apps ready for users to take advantage of them.

Im not against AVX/AVX2, i am a supporter. ;)

ps: learn to have a conversation and stop calling other peoples opinions lies. Counter what im saying with evidence or any logical conversation but stop the calling.
 

2is

Diamond Member
Apr 8, 2012
4,281
131
106
Real world results do matter. BD looks nice on paper but fails in the real world. I believe AMD chose the only path they could. Can't beat intel core for core so let's beat them on core count and develop it as cheaply as possible. 20 cores = larger die = more expensive. Not sure what point you're trying to make with that poor example. Their "our 8 or their 4" marketing campaign is a testament to that. Too bad for them, intel only needs 2 cores.

You're seriously diluting yourself if you think AMD developed BD thinking every software developer was going to bow down at its awesomemess, give AMD a reach around and start optimizing software for BD.
 
Last edited:

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
Im not debating if all applications are able to benefit from AVX2. Im saying that on the first day of Haswell's release, we will not have that many applications with AVX2 support.

There will be some. But that number will grow quicker than AVX did. So I agree with you to a certain point.

You will not see any tangible performance increase in legacy software going from IB to Haswell (from the specs we know today). Today's multithreaded programs would gain a 30-50% instant performance increase with two more cores. A Quad core Haswell will have to wait for applications to be recompiled or written with AVX2 in order to gain more than 5-10% IPC and performance over the quad core IB.

You will see similar performance gains as we did from Nehalem to SB (Tock), and even better gains then SB to IB. Expecting anything more from "legacy code" is just not realistic.


A six core Haswell would get 30-50% more performance from day one ;)

No. Only if the application can take advantage of more then 8 threads (4C/8T). How many of those can you name? Or actually use for that matter?
 

dma0991

Platinum Member
Mar 17, 2011
2,723
1
0
Real world results do matter. BD looks nice on paper but fails in the real world. I believe AMD chose the only path they could. Can't beat intel core for core so let's beat them on core count and develop it as cheaply as possible. 20 cores = larger die = more expensive. Not sure what point you're trying to make with that poor example. Their "our 8 or their 4" marketing campaign is a testament to that. Too bad for them, intel only needs 2 cores.

You're seriously diluting yourself if you think AMD developed BD thinking every software developer was going to bow down at its awesomemess, give AMD a reach around and start optimizing software for BD.
BD was made specifically for servers and it has a better reputation there than it is on desktop. It wasn't the only path that they could choose nor is it a deception of "moar cores = better performance" that the less informed would be lead to believe.

What didn't work was that they tried to make it work for desktop which favors fewer cores and more performance per core. Let me emphasize my point, BD was made from the ground up for a niche server market, not so spectacular for desktop as it was never their aim. Of course the marketing department had to sell them one way or the other and made it seem that it gives ultimate gaming performance but backfired.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
There was one AVX application since SB and BD was released, x264 HD. After than long and we still only have that one.
You can't compare AVX2 to AVX1.

AVX1 was a necessary intermediate step, but it's a crippled extension. The goal has always been AVX2: a complete 256-bit ISA with a vector equivalent of every scalar instruction, including gather. To get there they needed to:

1) Extend the registers to 256-bit
2) Make the integer SIMD execution stack capable of floating-point operations
3) Make the floating-point SIMD execution stack capable of integer operations
4) Implement gather support
5) Implement fused-multiply-add
6) Double the cache bandwidth

AVX1 is only the first two steps. AVX2 in Haswell completes the rest of it. It has always been clear to developers that AVX1 would have limited applicability and the gains would be limited. And thus the adoption has been slow. AVX2 is a completely different thing. It applies to any code that has loops with independent iterations. That's a lot of software out there. And the most fantastic thing is that you can get an eightfold parallelisation with minimal effort.

The AVX2 specification has been available for a year now, and today most major compilers are ready to support it. That still leaves a year for the developers to press the compile button (and most of it would be in middleware, not application code).

Last but not least, you falsely assume that more cores means more performance. I've seen several applications run slower or gain no appreciable performance with more cores, simply because the synchronization overhead increases quadratically. And yes, this will greatly improve with TSX, but again we need middleware developers to press the compile button.

Heck, we might be ready to move to 6-core in the mainstream market as soon as with Haswell's 16 nm schrink, Broadwell. But there's no point in rushing it. AVX2 and TSX have to be used first.
 
Last edited:

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
You're seriously diluting yourself if you think AMD developed BD thinking every software developer was going to bow down at its awesomemess, give AMD a reach around and start optimizing software for BD.

BD is actually going to perform better once software is recompiled to use the new instructions is has. The problem is that developers are not going to do that until Intel supports the same instructions (Haswell). So being the first is not always the best ;)
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
You can't compare AVX2 to AVX1. AVX1 was a necessary intermediate step, but it's a crippled extension. The goal has always been AVX2: a complete 256-bit ISA with a vector equivalent of every scalar instruction, including gather. To get there they needed to:

1) Extend the registers to 256-bit
2) Make the integer SIMD execution stack capable of floating-point operations
3) Make the floating-point SIMD execution stack capable of integer operations
4) Implement gather support
5) Implement fused-multiply-add
6) Double the cache bandwidth AVX1 is only the first two steps.

AVX2 in Haswell completes the rest of it. It has always been clear to developers that AVX1 would have limited applicability and the gains would be limited. And thus the adoption has been slow. AVX2 is a completely different thing. It applies to any code that has loops with independent iterations. That's a lot of software out there. And the most fantastic thing is that you can get an eightfold parallelisation with minimal effort.

The AVX2 specification has been available for a year now, and today most major compilers are ready to support it. That still leaves a year for the developers to press the compile button (and most of it would be in middleware, not application code).

I am just going to copy and paste this everytime someone comes along with the arguement about how AVX1 was adopted slowly. Well said.
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
@Lonbjerg

We have a year ahead of us, lets pray will have enough apps ready for users to take advantage of them.

Im not against AVX/AVX2, i am a supporter. ;)

ps: learn to have a conversation and stop calling other peoples opinions lies. Counter what im saying with evidence or any logical conversation but stop the calling.

Don't want to be called out on a lie...well then don't lie:

More cores is the best answer for more performance for day one. By the time Haswell will be released, more apps will be multhithreaded to take advantage of the increased core count than apps supporting AVX/AVX2.

Technically it might be better and more efficient to increase the work done per core but AVX/AVX2 is an ISA. It needs to be implemented in to the application and it will take years for AVX2 applications to be available in desktop.

Im not saying we dont need AVX/AVX2 but we will not be able to use that performance from day one.

That has suddenly been been "transformed" into "not 10-20 applications"...that has been transformed into "many".

So from zero to 10-20 to many.

0 != 10-20 != "many"

Unless you want to blame Google Transslate now?

TL;DR
Don't whine about lies, when lying...
 

Lonbjerg

Diamond Member
Dec 6, 2009
4,419
0
0
BD is actually going to perform better once software is recompiled to use the new instructions is has. The problem is that developers are not going to do that until Intel supports the same instructions (Haswell). So being the first is not always the best ;)

Kinda reminds me of 3Dnow ;)
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
You will not see any tangible performance increase in legacy software going from IB to Haswell (from the specs we know today).

we can expect Haswell to have increased L1D/L2 cache bandwidth from the IDF Spring disclosures [1] so even very basic functions like memcpy and memset (typically used by pretty much any application) will enjoy a massive speedup with AVX-256 code vs SSE, unlike on Sandy/Ivy where there is 0% speedup due to the half baked AVX-256 implementation (and thus no insentive for a wide adoption), any application merely linked with *today's* AVX-aware libraries without even a recompile will enjoy nice speedups

[1] : ARCS002 Introduction to the upcoming Intel(R) Advanced Vector Extension 2, slide 62 with actual SSE to AVX2 speedups for example
 
Last edited:

BenchPress

Senior member
Nov 8, 2011
392
0
0
You obviously don't understand CPUs, so arguing is pointless.
Then why are you arguing?
Enhancements to IPC have already been listed bere for Haswell. Probably something like 5-10% will be the norm (again).
Again, which enhancements would that be exactly? Since "obviously" I don't understand CPUs, you must be able to tell me something I didn't know already...
The point of Haswell is AVX2 integration and incorporating existing GPGPU-like functionality that will quite literally enable some applications to have HUGE gains.
Yes, good job paraphrasing my previous posts. However, AVX2 doesn't have a single thing to do with IPC. So please tell me all about Haswell's IPC enhancing technology.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
That was kinda the first time AMD did a 1up on intel wasnt it...
No. Intel introduced SSE soon after AMD introduced 3DNow! and it was vastly superior. SSE registers don't alias the x87 registers like MMX registers do, and they are twice the width. It took AMD almost three years to implement SSE, and by that time Intel already had SSE2.

3DNow! was a failure, and AMD said they'll remove support for it from future processors.
 

bronxzv

Senior member
Jun 13, 2011
460
0
71
That doesn't help on the consumer side. I'm pretty sure Xeons will love it, but AVX apps are few and far between.

well for example all today's desktop applications using the IPP libraries http://software.intel.com/en-us/articles/intel-ipp/ are effectively using AVX if linked after January 2011 http://software.intel.com/en-us/art...r-intel-avx-intel-advanced-vector-extensions/, the speedups aren't that great though due to the limited L1D/L2 bandwidth on SNB/IVB targets
 
Last edited: