David Kanter dissects Haswell

Phynaz · Nov 14, 2012

http://www.realworldtech.com/haswell-cpu/

Makes me want one

TuxDave · Nov 14, 2012

I was reading that earlier. Nice deep dive.

Vesku · Nov 14, 2012

Recompile all the programs!

Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.

ShintaiDK · Nov 14, 2012

Vesku said:
Recompile all the programs!

Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.

VS2012 and Intel compilers for example already support AVX2 code today.

Vesku · Nov 14, 2012

But there's also TSX and RTM.

Makaveli · Nov 14, 2012

great article... alittle to tech heavy for me now had a couple drinks.

But will make for a great read tomorrow at work!

ShintaiDK · Nov 14, 2012

Vesku said:
But there's also TSX and RTM.

Also supported in VS2012 and Intel compilers (v13). And from GCC 4.8.

Cerb · Nov 14, 2012

Vesku said:
Recompile all the programs!

Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.

AVX2 has features that have been desired for quite some time, now, and Intel has had the spec, and software tools, out for some time; so content creation application makers will have been able to make a decision about it already, and likely will have already been working on adding support, if they decided to.

There shouldn't be any need to pester anybody. Remember how Intel wanted everyone to follow their high-speed dream with Netburst, and everybody more or less went, "Uh, Hell no, guys," and it wasn't Intel's finest hour? Yeah, well, Intel didn't repeat that mistake. AVX2 and HTM are examples of giving the customers what they want, plain and simple.

Vesku said:
But there's also TSX and RTM.

Those will take a good bit of time. A compiler might be able to automatically elide a few locks, but generally, you're going to need to carefully implement that kind of thing (on the bright side, careful is a matter of verifying correctness: very little code changes will be needed for elision, and it shouldn't break anything on uarches not supporting it, x86 or not). Unlike AVX2, as well, it is more of a future need, as far as our client PCs go. HPC and big DB users could start taking advantage of it ASAP. Like 64-bit support almost 10 years ago (or half of what went into the 386, for that matter), it's a case of adding something before it's really needed.

Transactional memory has been known for at least a decade now to be the best way to improve scaling, without getting rid of the benefits of a lock-using system; but, sadly, while straight-forward, it's not remotely simple or elegant. By around '07, pretty much all the hardware-level worries had been figured out, so now it goes into our CPUs. HPC users may start utilizing it within months of getting Haswell-based clusters, and the rest of us will find it trickling into our sync-limited multithreaded applications, over the course of the next 5-10 years. We're not yet in desperate need of it, but when that time will be is a big question mark, and we can go ahead and make use of it, so nobody wants to be late in supporting it.

meloz · Nov 15, 2012

Absolutely love David's work. This was another gem.

His article on TSX is still the best explanation I have found on the 'net.

2is · Nov 15, 2012

I'm looking forward to haswell, I don't think I'll be upgrading my desktop to one, but I'm long overdue for a new laptop.

blastingcap · Nov 15, 2012

With CPU idle power usage getting lower and lower, it's time mobo makers caught up. It's sad to have such low-idling CPUs and then have the mobo and RAM and everything else eat up so much more wattage.

cytg111 · Nov 15, 2012

Vesku said:
Recompile all the programs!

Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.

hrmmm .. one of these genius compiler guys should invent the binary compiler. Take a binary, say targetted 386 and recompile it towards a new arch.
Should be doable.

2is · Nov 15, 2012

blastingcap said:
With CPU idle power usage getting lower and lower, it's time mobo makers caught up. It's sad to have such low-idling CPUs and then have the mobo and RAM and everything else eat up so much more wattage.

This is likely the reason Intel took matters into their own hands and will have VRMs integrated into haswell.

Cerb · Nov 15, 2012

cytg111 said:
hrmmm .. one of these genius compiler guys should invent the binary compiler. Take a binary, say targetted 386 and recompile it towards a new arch.
Should be doable.

But, what benefit will it have? Only with SSE2 code would you get any benefit, and those programs are likely to have even better AVX2 support added in the near future, anyway. For everything else, it would be a chore, and would probably not be much better than the OS thunking it, instead, if it can't run it natively (pure 32-bit 386 code already runs quite well on modern Intel CPUs).

Interpreted and JIT VMs have been made to handle converting to new systems, but they simply don't have the benefit of the original source code's ASTs, to help target the new computer optimally, and they must mimic the effects of every single instruction, in case a side effect was being used for some purpose (elimination of such should be possible, of course, but probably at some very high development and compiler time cost).

Haserath · Nov 15, 2012

blastingcap said:
With CPU idle power usage getting lower and lower, it's time mobo makers caught up. It's sad to have such low-idling CPUs and then have the mobo and RAM and everything else eat up so much more wattage.

http://www.anandtech.com/show/6355/intels-haswell-architecture/3

Quite a bit has to be done it seems. We may get there eventually for desktop...

Idontcare · Nov 15, 2012

Cerb said:
But, what benefit will it have? Only with SSE2 code would you get any benefit, and those programs are likely to have even better AVX2 support added in the near future, anyway. For everything else, it would be a chore, and would probably not be much better than the OS thunking it, instead, if it can't run it natively (pure 32-bit 386 code already runs quite well on modern Intel CPUs).

Interpreted and JIT VMs have been made to handle converting to new systems, but they simply don't have the benefit of the original source code's ASTs, to help target the new computer optimally, and they must mimic the effects of every single instruction, in case a side effect was being used for some purpose (elimination of such should be possible, of course, but probably at some very high development and compiler time cost).

Reminds me of DEC's FX!32 software.

Emulation has been around for a while as a concept, but FX!32 went one stage further. It analysed the way programs worked and in real time, developed dynamic-link library (DLL) files of native Alpha code that the application could call upon next time it ran.

cytg111 · Nov 15, 2012

Idontcare said:
Reminds me of DEC's FX!32 software.

Not a bad idea IMO and in 'our' case its not a totally different arch.

But, what benefit will it have?

- You dont have to wait for your favorite software vendor to get benefit from your new arch. Your software vendor may never get around to it or may even not be in business anymore. There's a ton of scenarios where this makes sense IMO.

sm625 · Nov 15, 2012

I really dont understand the 10 watt thing. I have a penryn CULV notebook that has a 10W cpu. It is a die shrink of a 65nm core originally designed about 8 years ago. In that time, we have quadrupled the transistor budget, and reduced voltage by 20%. Everything seems to indicate we should be able to get 2.0GHz Core 2 cpu performance, plus i3-2310M gpu performance from a 5 watt package. If itnel cannot raise the bar at least to that level after 8 years, they deserve to bleed another billion or two to apple.

Idontcare · Nov 15, 2012

sm625 said:
I really dont understand the 10 watt thing. I have a penryn CULV notebook that has a 10W cpu. It is a die shrink of a 65nm core originally designed about 8 years ago. In that time, we have quadrupled the transistor budget, and reduced voltage by 20%. Everything seems to indicate we should be able to get 2.0GHz Core 2 cpu performance, plus i3-2310M gpu performance from a 5 watt package. If itnel cannot raise the bar at least to that level after 8 years, they deserve to bleed another billion or two to apple.

The difference of course is the amount of work, i.e. compute, being done with those 10 watts.

Haswell at 10W will probably do 2-3x the total calculations per second that a 10W CULV penryn would acheive.

The power is definitely a challenge though, apparently. My 3770k for example, when I underclock it to the lowest multi (16x) and optimize the voltage to be as low as possible while remaining stable for LinX operation it still consumes 12-13W (at 1.6GHz, 0.636V, 36°C).

That means at most my 3770k could be clocked at 1.2GHz if it was to fit inside a 10W power envelope. It isn't easy to scale down, which is why Atom was created.

podspi · Nov 15, 2012

This is one of the reasons I'm very excited about the new Atoms. For many tasks, C2D level performance is fine, and I bet Atom will be able to reach to lower power levels than Haswell will.

That being said, Haswell looks like it'll be fantastic for Ultraportables.

moonbogg · Nov 15, 2012

He says he estimates Haswell to have 10% better performance than Sandy bridge for current software. Does that mean 5% better than Ivy Bridge then? I'm hoping for a beast gaming chip that will ruin my 3930k. I have fear that this won't happen.

fov001 · Nov 15, 2012

hey guys....I am new here so please be easy on judgment

question, when will Haswell be available?

I'm about to purchase a set of SSD (Samsung 840Pro x2 128GB RAID0) but would rather wait for this bad boy instead.

Lonyo · Nov 15, 2012

Probably 6+ months.

fov001 · Nov 15, 2012

Lonyo said:
Probably 6+ months.

oops. I was actually talking about Intel S3700. wrong thread lol

Haserath · Nov 15, 2012

moonbogg said:
He says he estimates Haswell to have 10% better performance than Sandy bridge for current software. Does that mean 5% better than Ivy Bridge then? I'm hoping for a beast gaming chip that will ruin my 3930k. I have fear that this won't happen.

He mentions branch prediction improvements and memory improvements. Those are probably the two biggest enhancements.

We'll see what Haswell can do with clocks on 22nm since Intel says they opted for execution speed over IC quality on IVB.
It would be nice to see them do something with cache sizes next time. That may at least improve it some more even though that's probably not completely efficient.

Though I did hear Haswell might have 1MB L2 caches?, which would be kind of nice for us gamers.

David Kanter dissects Haswell

Lifer

Lifer

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Elite Member

Senior member

Diamond Member

Diamond Member

Lifer

Diamond Member

Elite Member

Senior member

Elite Member

Lifer

Diamond Member

Elite Member

Golden Member

Lifer

Junior Member

Lifer

Junior Member

Senior member