LGA 2011 Owners.. The Xeon 8 Core 16 HT E5-2687W is finally here

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

blckgrffn

Diamond Member
May 1, 2003
9,287
3,427
136
www.teamjuchems.com
Sandy Bridge is 12-15% faster than Lynnfield: http://www.computerbase.de/artikel/prozessoren/2011/test-intel-sandy-bridge/47/

http://www.tomshardware.com/reviews/processor-architecture-benchmark,2974-15.html

You'll likely never get to see it in person beyond synthetic benchmarks like Sandra and LinX, but its used outside of the consumer market. And HPC will definitely gobble it up when Xeon E5 launches.

PrimeGrid Distributed Computing project is AVX enabled for a 20-50% performance gain on Intel CPUs. Pushes CPU power consumption up noticeably, the 2500k in my signature was running the AVX code path during our last race. It changed the time it took to crunch a WU from 1400 seconds to sub 900 seconds.

The i3 would have been rocking it to, but that is an ESXi box and I need to get to ESXi 5 in order for VMs to have access to AVX...

So, I have seen it and benefited from it :)

Here is to hoping they'll fix it for BD/PD and that other DC projects will adopt optimized code paths as well.
 

Edrick

Golden Member
Feb 18, 2010
1,939
230
106
PrimeGrid Distributed Computing project is AVX enabled for a 20-50% performance gain on Intel CPUs. Pushes CPU power consumption up noticeably, the 2500k in my signature was running the AVX code path during our last race. It changed the time it took to crunch a WU from 1400 seconds to sub 900 seconds.

When did they update it? (what version?) I need to check it out.

I beta tested the AVX client for Seti@home and it only gave about a 5% performance gain. I do not think it was ever officially released.
 
Last edited:

blckgrffn

Diamond Member
May 1, 2003
9,287
3,427
136
www.teamjuchems.com
When did they update it? (what version?) I need to check it out.

I beta tested the AVX client for Seti@home and it only gave about a 5% performance gain. I do not think it was ever officially released.

Hmmm... let me dig up the link from the DC subforum. It wasn't "official", but it was implemented for all projects easily enough...

http://www.primegrid.com/forum_thread.php?id=3912#48719 <-- The "prime" site (har har)

http://forums.anandtech.com/showthread.php?t=2220025 <-- Our thread on it, some detailed instructions and local user results

During the race it was easy enough to click through WU's and identify folks that were AVX enabled as they were all under 1k per WU, even heavily OC'd SB wasn't getting under that limbo bar without it.

It is one of the rare occassions where the Thuban was really outclassed in pure DC performance by SB, it was taking ~1600-1800 seconds on an x6 to get through a WU, so even at 6 threads vs four it was getting trounced.

I did track down what I thought was a stock clocked 8120 from Japan during the race and it was taking ~2300 seconds per WU. If AVX could be brought into play on BD that could actually make them pretty competitive too.
 
Last edited:

BenchPress

Senior member
Nov 8, 2011
392
0
0
Ok, fair enough, I should have worded things differently...

Correct me if I'm wrong, but Sandy Bridge primarily achieved that speedup over Nehalem by improving the cache latencies. Nehalem had obvious flaws on that front. But how exactly would Haswell achieve an ~15% improvement in IPC over Sandy Bridge? In light of the law of diminishing returns that's a pretty dramatic increase! And since Haswell is already known to feature AVX2 and transactional memory, and also clearly aims for improved power efficiency, what technology would provide such an increase in IPC without costing too many extra transistors?

It seems you're only basing your expectation on past improvements, but that offers no guarantee at all that they'll achieve it once again, or even intend to. Just look at AMD. Most people expected an improvement in IPC, but Bulldozer sacrificed IPC in exchange for more cores. AMD probably wishes it had a 15% increase in IPC, but that simply wasn't possible without making the chip even bigger and more power hungry. For the sake of power efficiency, Intel might very well decide to make Haswell slightly less aggressive, thereby lowering the IPC, and for the high-end desktop market compensate it with slightly higher clocks (for which it clearly has headroom).

So can you name any technology which would offer a 15% increase in IPC over Sandy Bridge without jeopardizing other metrics?
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Have you not read any of the Haswell posts on these forums?
I have, but none of them explained how a 15% increase in IPC could be achieved.

There were some doubts about whether Haswell would be capable of two 256-bit FMA instructions per clock, but it was interesting to see how that could be achieved relatively easily from Sandy Bridge's existing execution cluster. Likewise the explanation for doubling the cache bandwidth is very convincing. But unless I missed it, nobody has given a reasonable explanation for a 15% increase in IPC for legacy code.

Haswell sounds pretty much like Sandy Bridge + AVX2. If we assume that Sandy Bridge was tuned to near perfection, then there are no obvious ways to increase IPC. So anyone expecting a 15% increase must think otherwise, and all I'm asking is to point out this substantial room for improvement which can be fixed without jeopardizing other design goals.
 

IntelUser2000

Elite Member
Oct 14, 2003
8,686
3,786
136
Correct me if I'm wrong, but Sandy Bridge primarily achieved that speedup over Nehalem by improving the cache latencies. Nehalem had obvious flaws on that front.

There's no single biggest point in Sandy Bridge that contributes to the performance gain. Ivy Bridge's L3 cache is another 20-30% faster, yet the gains from cache is probably only 1-3%. Mind you, Core 2 Duo didn't do much better than Sandy Bridge did. Advancement over Core Duo was 15-20% per clock.

Sandy Bridge's Performance gains are likely split around the doubled load port, improved cache architecture, and branch prediction/better OoO. Features like uop cache is mainly to lower power, and performance impact is said to be insignificant.

Even Ivy Bridge, a mere shrink to 22nm, brings 4-6% IPC increase. There's no reason to think Haswell, a significant architectural change, will benefit less than that.

AMD probably wishes it had a 15% increase in IPC, but that simply wasn't possible without making the chip even bigger and more power hungry.
I don't think they did. Ideally it was all about Turbo Core related gains and clock + extra cores. It would have been fine, had their idea worked. Rest of the hype and claims were all marketing at work and fanboy fantasies.
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
Mind you, Core 2 Duo didn't do much better than Sandy Bridge did. Advancement over Core Duo was 15-20% per clock.
Well, duh, Core 2 added a third arithmetic execution port!
Sandy Bridge's Performance gains are likely split around the doubled load port, improved cache architecture, and branch prediction/better OoO.
Alright, but what's left for Haswell then? You still haven't answered that question.
Even Ivy Bridge, a mere shrink to 22nm, brings 4-6% IPC increase. There's no reason to think Haswell, a significant architectural change, will benefit less than that.
Ivy Bridge is slightly more than a mere shrink. It executes MOV instructions at the register renaming stage. And since it can rename 4 registers per clock and doesn't occupy any of the 3 execution ports this removes a potential bottleneck. It also dynamically partitions Hyper-Threading resources. So we know what offers the increase in IPC for Ivy Bridge.

What we don't know, is how Haswell could possibly achieve 15% on top of that. In the past there were always people who pointed out some technology which has the potential to realize such an increase, but now that performance/Watt is a crucial metric that's not so obvious any more. So if you expect something as substantial as 15% then I'd like you to put your money where your mouth is and take a stab at how it can be done.
I don't think they did. Ideally it was all about Turbo Core related gains and clock + extra cores. It would have been fine, had their idea worked. Rest of the hype and claims were all marketing at work and fanboy fantasies.
With all due respect, expecting a 15% increase in IPC for Haswell sounds like the same kind of fanboy fantasy to me, unless you can back it up with plausible ways to achieve that...

I'd rather be surprised that they do pull it off while nobody with technical insight expected it, instead of getting disappointed because people expected it solely based on Intel's past achievements. Unlike wine, technology doesn't improve over time by sitting on your hands. There have to be hard-earned advances.
 
Mar 10, 2006
11,715
2,012
126
Well, duh, Core 2 added a third arithmetic execution port!

Alright, but what's left for Haswell then? You still haven't answered that question.

Ivy Bridge is slightly more than a mere shrink. It executes MOV instructions at the register renaming stage. And since it can rename 4 registers per clock and doesn't occupy any of the 3 execution ports this removes a potential bottleneck. It also dynamically partitions Hyper-Threading resources. So we know what offers the increase in IPC for Ivy Bridge.

What we don't know, is how Haswell could possibly achieve 15% on top of that. In the past there were always people who pointed out some technology which has the potential to realize such an increase, but now that performance/Watt is a crucial metric that's not so obvious any more. So if you expect something as substantial as 15% then I'd like you to put your money where your mouth is and take a stab at how it can be done.

With all due respect, expecting a 15% increase in IPC for Haswell sounds like the same kind of fanboy fantasy to me, unless you can back it up with plausible ways to achieve that...

I'd rather be surprised that they do pull it off while nobody with technical insight expected it, instead of getting disappointed because people expected it solely based on Intel's past achievements. Unlike wine, technology doesn't improve over time by sitting on your hands. There have to be hard-earned advances.

The architects at Intel making >$150K/yr to figure this out will figure it out. And they've been working on Haswell since 2007/2008...
 
Last edited:

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Why are you making the "if I can't figure out how it can be done, it can't be done!" argument?

Are you of the caliber that you should be on Intel's CPU design team?
 

Idontcare

Elite Member
Oct 10, 1999
21,110
59
91
I have, but none of them explained how a 15% increase in IPC could be achieved.

There were some doubts about whether Haswell would be capable of two 256-bit FMA instructions per clock, but it was interesting to see how that could be achieved relatively easily from Sandy Bridge's existing execution cluster. Likewise the explanation for doubling the cache bandwidth is very convincing. But unless I missed it, nobody has given a reasonable explanation for a 15% increase in IPC for legacy code.

Haswell sounds pretty much like Sandy Bridge + AVX2. If we assume that Sandy Bridge was tuned to near perfection, then there are no obvious ways to increase IPC. So anyone expecting a 15% increase must think otherwise, and all I'm asking is to point out this substantial room for improvement which can be fixed without jeopardizing other design goals.

Its not the same thing, but I have wondered if Intel's circuit design techniques relating to power-gating and turbo clocking will make its way into finer granularity than the core-level.

Thinking of those double-pumped circuits in netburst. I wonder if it would be possible or beneficial to create turbo-clocked domains within the core itself for specific circuits that could be clocked substantially higher than the standard base clockspeed to boost the effective IPC for given instructions?

Not necessarily talking about doubling the clockspeed, and not meaning to say it would be the same as double-pumped circuits, but just invoking that model as an example to speak to here. (hyperthreading was resurrected from netburst as well, after all)
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
Remind me, how long did AMD work on Bulldozer? ;)


Since the Pentium days, Intel, on their mainstream x86 side, has had only a single misstep (netburst). Not living up to what they market is the exception, as opposed to the norm. If they say 10% across the board, you can bet that for most things it will be around a 10% improvement. It's not like they say 10%, and then the real improvement is +/- 5% depending upon the application.

As long as they keep meeting their claims for the next releases, we have no reason to disbelieve. Now, any 5+ year forcasts, take with a grain of salt.
 

Ferzerp

Diamond Member
Oct 12, 1999
6,438
107
106
I guess that's a clever attempt to skip Atom ;)


No, they've had problems with their forays in to other areas. (Larrabee, Atom, arguably Itanium). I wasn't trying to argue that. That's why I was clear about mainstream x86.
 

NTMBK

Lifer
Nov 14, 2011
10,298
5,289
136
No, they've had problems with their forays in to other areas. (Larrabee, Atom, arguably Itanium). I wasn't trying to argue that. That's why I was clear about mainstream x86.

Arguably? It's nicknamed Itanic for a reason ;)
 

blckgrffn

Diamond Member
May 1, 2003
9,287
3,427
136
www.teamjuchems.com
Since the Pentium days, Intel, on their mainstream x86 side, has had only a single misstep (netburst). Not living up to what they market is the exception, as opposed to the norm. If they say 10% across the board, you can bet that for most things it will be around a 10% improvement. It's not like they say 10%, and then the real improvement is +/- 5% depending upon the application.

As long as they keep meeting their claims for the next releases, we have no reason to disbelieve. Now, any 5+ year forcasts, take with a grain of salt.


Haha, it wasn't like Netburst was a one and done thing. It drug out over what, five or six years? From ~1.3Ghz to 3.8? SDRAM & RDRAM to DDR3?
 

BenchPress

Senior member
Nov 8, 2011
392
0
0
The architects at Intel making >$150K/yr to figure this out will figure it out. And they've been working on Haswell since 2007/2008...
Sigh. If you don't have a clue, then just say you don't have a clue but that's what you hope will happen. No shame in that.

The thing is, if I said AMD's next major architecture might have a 30% increase in IPC, I could fairly easily justify that. But it reeks of fanboyism to claim that's not going to make any difference because Haswell will have a 15% increase in IPC, on top of the lead they already have.

I'm trying to learn something new here. In the world of technology there's no point in expecting past advances to be achieved again in the future. Diminishing returns prevent doing the same thing again and getting the same result, so there has to be innovation.
 

tweakboy

Diamond Member
Jan 3, 2010
9,517
2
81
www.hammiestudios.com
Why not bring at least 8 core to the Desktop Enthusiast.

Weve been on 4 cores for over 5 years. "Ok fine HT 8 threads... Technology in the cores department is no good. ok they made them faster but for people that do DAW , more cores more RAM 128GB boards.
 

Smartazz

Diamond Member
Dec 29, 2005
6,128
0
76
Why not bring at least 8 core to the Desktop Enthusiast.

Weve been on 4 cores for over 5 years. "Ok fine HT 8 threads... Technology in the cores department is no good. ok they made them faster but for people that do DAW , more cores more RAM 128GB boards.

Intel probably figures that a vast majority of users' needs will be met with the 1155 platform. They want to also lure potential consumers to their more expensive 2011 platform. From a business standpoint it makes sense.
 

Smartazz

Diamond Member
Dec 29, 2005
6,128
0
76
Haha, it wasn't like Netburst was a one and done thing. It drug out over what, five or six years? From ~1.3Ghz to 3.8? SDRAM & RDRAM to DDR3?

Intel still managed to absolutely dominate AMD even back then in sales. Not only is Intel that much bigger than AMD has ever been, but they cheated a bit to keep AMD out of Dell computers.