|
|
 |
|
11-14-2012, 07:29 PM
|
#1
|
|
Diamond Member
Join Date: Mar 2006
Posts: 5,257
|
David Kanter dissects Haswell
|
|
|
11-14-2012, 07:50 PM
|
#2
|
|
Lifer
Join Date: Oct 2002
Posts: 10,048
|
I was reading that earlier. Nice deep dive.
__________________
post count = post count + 0.999.....
(\__/)
(='.'=)This is Bunny. Copy and paste bunny into your
(")_(")signature to help him gain world domination.
|
|
|
11-14-2012, 07:51 PM
|
#3
|
|
Platinum Member
Join Date: Aug 2005
Location: Seattle, WA
Posts: 2,171
|
Recompile all the programs!
Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.
|
|
|
11-14-2012, 08:05 PM
|
#4
|
|
Diamond Member
Join Date: Apr 2012
Location: Copenhagen
Posts: 5,901
|
Quote:
Originally Posted by Vesku
Recompile all the programs!
Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.
|
VS2012 and Intel compilers for example already support AVX2 code today.
__________________
MiniITX - Intel i5 4670
Board - Intel DH87FB
SSD - Crucial M500 480GB mSATA
Memory - Crucial Ballistix Sport 2x8GB 1600Mhz 1.35V
Case - Sugo SG08B with 600W PSU
GPU - Zotac GTX 680 2GB
|
|
|
11-14-2012, 08:20 PM
|
#5
|
|
Platinum Member
Join Date: Aug 2005
Location: Seattle, WA
Posts: 2,171
|
But there's also TSX and RTM.
|
|
|
11-14-2012, 08:28 PM
|
#6
|
|
Platinum Member
Join Date: Feb 2002
Location: Ontario, Canada
Posts: 2,545
|
great article... alittle to tech heavy for me now had a couple drinks.
But will make for a great read tomorrow at work!
__________________
Intel Core i7 970 HT@4.0 Ghz 1.24v | TRUE Black Rev.C + Scythe S-Flex 1600 rpm x2 | Asus P6-T Deluxe V2 12GB Mushkin DDR3-1600 7-8-7-20 1T | 7970 Ghz Twin Frozr 3GB | EVGA 650 SC Physx | Logitech G15+G500 Win 7 x64 | Intel 320GB G2 Raid 0 | WD 1TB Black Storage | ESATA 2TB Green | CM 690 II Advanced | Razor Vespula | D-link DGL-4500 | HP ZR24w | Logitech Z560 | X-FI Titanium | Corsair Pro Series Gold AX750
|
|
|
11-14-2012, 08:32 PM
|
#7
|
|
Diamond Member
Join Date: Apr 2012
Location: Copenhagen
Posts: 5,901
|
Quote:
Originally Posted by Vesku
But there's also TSX and RTM.
|
Also supported in VS2012 and Intel compilers (v13). And from GCC 4.8.
__________________
MiniITX - Intel i5 4670
Board - Intel DH87FB
SSD - Crucial M500 480GB mSATA
Memory - Crucial Ballistix Sport 2x8GB 1600Mhz 1.35V
Case - Sugo SG08B with 600W PSU
GPU - Zotac GTX 680 2GB
|
|
|
11-14-2012, 08:37 PM
|
#8
|
|
Lifer
Join Date: Aug 2000
Posts: 12,262
|
Quote:
Originally Posted by Vesku
Recompile all the programs!
Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.
|
AVX2 has features that have been desired for quite some time, now, and Intel has had the spec, and software tools, out for some time; so content creation application makers will have been able to make a decision about it already, and likely will have already been working on adding support, if they decided to.
There shouldn't be any need to pester anybody. Remember how Intel wanted everyone to follow their high-speed dream with Netburst, and everybody more or less went, "Uh, Hell no, guys," and it wasn't Intel's finest hour? Yeah, well, Intel didn't repeat that mistake. AVX2 and HTM are examples of giving the customers what they want, plain and simple.
Quote:
Originally Posted by Vesku
But there's also TSX and RTM.
|
Those will take a good bit of time. A compiler might be able to automatically elide a few locks, but generally, you're going to need to carefully implement that kind of thing (on the bright side, careful is a matter of verifying correctness: very little code changes will be needed for elision, and it shouldn't break anything on uarches not supporting it, x86 or not). Unlike AVX2, as well, it is more of a future need, as far as our client PCs go. HPC and big DB users could start taking advantage of it ASAP. Like 64-bit support almost 10 years ago (or half of what went into the 386, for that matter), it's a case of adding something before it's really needed.
Transactional memory has been known for at least a decade now to be the best way to improve scaling, without getting rid of the benefits of a lock-using system; but, sadly, while straight-forward, it's not remotely simple or elegant. By around '07, pretty much all the hardware-level worries had been figured out, so now it goes into our CPUs. HPC users may start utilizing it within months of getting Haswell-based clusters, and the rest of us will find it trickling into our sync-limited multithreaded applications, over the course of the next 5-10 years. We're not yet in desperate need of it, but when that time will be is a big question mark, and we can go ahead and make use of it, so nobody wants to be late in supporting it.
__________________
"The computer can't tell you the emotional story. It can give you the exact mathematical design, but what's missing is the eyebrows." - Frank Zappa
Last edited by Cerb; 11-14-2012 at 08:42 PM.
|
|
|
11-14-2012, 11:29 PM
|
#9
|
|
Member
Join Date: Jul 2008
Posts: 156
|
Absolutely love David's work. This was another gem.
His article on TSX is still the best explanation I have found on the 'net.
|
|
|
11-14-2012, 11:45 PM
|
#10
|
|
Golden Member
Join Date: Apr 2012
Posts: 1,900
|
I'm looking forward to haswell, I don't think I'll be upgrading my desktop to one, but I'm long overdue for a new laptop.
__________________
Intel i7 3770K|240GB Intel SSD 520|Asus P8Z77-V Pro|2x GTX 680 SLI (2GB)|180GB Corsair Force SSD|Corsair TX750|2x8GB DDR3 1600 (1.35v)
|
|
|
11-15-2012, 12:42 AM
|
#11
|
|
Diamond Member
Join Date: Sep 2010
Posts: 4,626
|
With CPU idle power usage getting lower and lower, it's time mobo makers caught up. It's sad to have such low-idling CPUs and then have the mobo and RAM and everything else eat up so much more wattage.
__________________
Quote:
Originally Posted by BoFox
We had to suffer polygonal boobs for a decade because of selfish corporate reasons.
|
Main: 3570K + HD7970 + 16GB 1866 + AsRock Extreme4 Z77 + Eyefinity 5760x1080 eIPS
NAS and HTPC/workstation: Supermicro MBD-X9SCM + G530 + 16GB ECC; ASUS P8B WS + i3-3220; 1.168TB of Intel/Crucial/Samsung SSDs + 26TB of WD/Hitachi HDDs
|
|
|
11-15-2012, 12:59 AM
|
#12
|
|
Senior Member
Join Date: Mar 2008
Posts: 746
|
Quote:
Originally Posted by Vesku
Recompile all the programs!
Wonder if we should start pestering software companies now to actually do so, can take quite a while for retail software.
|
hrmmm .. one of these genius compiler guys should invent the binary compiler. Take a binary, say targetted 386 and recompile it towards a new arch.
Should be doable.
__________________
Quote:
Please dont deal in absolutes.
Everything in the verse is percentages. Everything.
-With the exception of the love for our children.
(cytg 2001)
|
|
|
|
11-15-2012, 01:04 AM
|
#13
|
|
Golden Member
Join Date: Apr 2012
Posts: 1,900
|
Quote:
Originally Posted by blastingcap
With CPU idle power usage getting lower and lower, it's time mobo makers caught up. It's sad to have such low-idling CPUs and then have the mobo and RAM and everything else eat up so much more wattage.
|
This is likely the reason Intel took matters into their own hands and will have VRMs integrated into haswell.
__________________
Intel i7 3770K|240GB Intel SSD 520|Asus P8Z77-V Pro|2x GTX 680 SLI (2GB)|180GB Corsair Force SSD|Corsair TX750|2x8GB DDR3 1600 (1.35v)
|
|
|
11-15-2012, 01:19 AM
|
#14
|
|
Lifer
Join Date: Aug 2000
Posts: 12,262
|
Quote:
Originally Posted by cytg111
hrmmm .. one of these genius compiler guys should invent the binary compiler. Take a binary, say targetted 386 and recompile it towards a new arch.
Should be doable.
|
But, what benefit will it have? Only with SSE2 code would you get any benefit, and those programs are likely to have even better AVX2 support added in the near future, anyway. For everything else, it would be a chore, and would probably not be much better than the OS thunking it, instead, if it can't run it natively (pure 32-bit 386 code already runs quite well on modern Intel CPUs).
Interpreted and JIT VMs have been made to handle converting to new systems, but they simply don't have the benefit of the original source code's ASTs, to help target the new computer optimally, and they must mimic the effects of every single instruction, in case a side effect was being used for some purpose (elimination of such should be possible, of course, but probably at some very high development and compiler time cost).
__________________
"The computer can't tell you the emotional story. It can give you the exact mathematical design, but what's missing is the eyebrows." - Frank Zappa
|
|
|
11-15-2012, 01:27 AM
|
#15
|
|
Senior Member
Join Date: Sep 2010
Posts: 561
|
Quote:
Originally Posted by blastingcap
With CPU idle power usage getting lower and lower, it's time mobo makers caught up. It's sad to have such low-idling CPUs and then have the mobo and RAM and everything else eat up so much more wattage.
|
http://www.anandtech.com/show/6355/i...architecture/3
Quite a bit has to be done it seems. We may get there eventually for desktop...
|
|
|
11-15-2012, 06:32 AM
|
#16
|
|
Administrator Elite Member
Join Date: Oct 1999
Posts: 19,182
|
Quote:
Originally Posted by Cerb
But, what benefit will it have? Only with SSE2 code would you get any benefit, and those programs are likely to have even better AVX2 support added in the near future, anyway. For everything else, it would be a chore, and would probably not be much better than the OS thunking it, instead, if it can't run it natively (pure 32-bit 386 code already runs quite well on modern Intel CPUs).
Interpreted and JIT VMs have been made to handle converting to new systems, but they simply don't have the benefit of the original source code's ASTs, to help target the new computer optimally, and they must mimic the effects of every single instruction, in case a side effect was being used for some purpose (elimination of such should be possible, of course, but probably at some very high development and compiler time cost).
|
Reminds me of DEC's FX!32 software.
Quote:
|
Emulation has been around for a while as a concept, but FX!32 went one stage further. It analysed the way programs worked and in real time, developed dynamic-link library (DLL) files of native Alpha code that the application could call upon next time it ran.
|
|
|
|
11-15-2012, 09:47 AM
|
#17
|
|
Senior Member
Join Date: Mar 2008
Posts: 746
|
Quote:
Originally Posted by Idontcare
Reminds me of DEC's FX!32 software.
|
Not a bad idea IMO and in 'our' case its not a totally different arch.
Quote:
|
But, what benefit will it have?
|
- You dont have to wait for your favorite software vendor to get benefit from your new arch. Your software vendor may never get around to it or may even not be in business anymore. There's a ton of scenarios where this makes sense IMO.
__________________
Quote:
Please dont deal in absolutes.
Everything in the verse is percentages. Everything.
-With the exception of the love for our children.
(cytg 2001)
|
|
|
|
11-15-2012, 03:00 PM
|
#18
|
|
Diamond Member
Join Date: May 2011
Posts: 3,183
|
I really dont understand the 10 watt thing. I have a penryn CULV notebook that has a 10W cpu. It is a die shrink of a 65nm core originally designed about 8 years ago. In that time, we have quadrupled the transistor budget, and reduced voltage by 20%. Everything seems to indicate we should be able to get 2.0GHz Core 2 cpu performance, plus i3-2310M gpu performance from a 5 watt package. If itnel cannot raise the bar at least to that level after 8 years, they deserve to bleed another billion or two to apple.
__________________
I am looking for a cheap upgrade to my 3 year old computer.
AT forum member #1: Buy a 3770k
I am looking for a way to get 10 more fps in TF2.
AT forum member #2: Buy a 3770k
|
|
|
11-15-2012, 03:09 PM
|
#19
|
|
Administrator Elite Member
Join Date: Oct 1999
Posts: 19,182
|
Quote:
Originally Posted by sm625
I really dont understand the 10 watt thing. I have a penryn CULV notebook that has a 10W cpu. It is a die shrink of a 65nm core originally designed about 8 years ago. In that time, we have quadrupled the transistor budget, and reduced voltage by 20%. Everything seems to indicate we should be able to get 2.0GHz Core 2 cpu performance, plus i3-2310M gpu performance from a 5 watt package. If itnel cannot raise the bar at least to that level after 8 years, they deserve to bleed another billion or two to apple.
|
The difference of course is the amount of work, i.e. compute, being done with those 10 watts.
Haswell at 10W will probably do 2-3x the total calculations per second that a 10W CULV penryn would acheive.
The power is definitely a challenge though, apparently. My 3770k for example, when I underclock it to the lowest multi (16x) and optimize the voltage to be as low as possible while remaining stable for LinX operation it still consumes 12-13W (at 1.6GHz, 0.636V, 36°C).
That means at most my 3770k could be clocked at 1.2GHz if it was to fit inside a 10W power envelope. It isn't easy to scale down, which is why Atom was created.
|
|
|
11-15-2012, 03:52 PM
|
#20
|
|
Golden Member
Join Date: Jan 2011
Location: USA
Posts: 1,451
|
This is one of the reasons I'm very excited about the new Atoms. For many tasks, C2D level performance is fine, and I bet Atom will be able to reach to lower power levels than Haswell will.
That being said, Haswell looks like it'll be fantastic for Ultraportables.
|
|
|
11-15-2012, 04:09 PM
|
#21
|
|
Golden Member
Join Date: Jan 2011
Posts: 1,978
|
He says he estimates Haswell to have 10% better performance than Sandy bridge for current software. Does that mean 5% better than Ivy Bridge then? I'm hoping for a beast gaming chip that will ruin my 3930k. I have fear that this won't happen.
__________________
Info about expensive garbage goes here
|
|
|
11-15-2012, 04:10 PM
|
#22
|
|
Junior Member
Join Date: Nov 2012
Location: New York
Posts: 24
|
hey guys....I am new here so please be easy on judgment
question, when will Haswell be available?
I'm about to purchase a set of SSD (Samsung 840Pro x2 128GB RAID0) but would rather wait for this bad boy instead.
Last edited by fov001; 11-15-2012 at 04:20 PM.
|
|
|
11-15-2012, 04:16 PM
|
#23
|
|
Lifer
Join Date: Aug 2002
Posts: 21,056
|
Probably 6+ months.
__________________
CPU: Q3570K @ 4.1GHz 1.23v // Mobo: Asus P8Z77-V // GFX: Radeon HD7950 @ 980/5300 // RAM: Corsair DDR3 @ 1600MHz 9-9-9-24 // SSD: Samsung 830 128GB
Video cards: TNT2, Ti4400, 9800, 7800GT(+7200GS), HD4850(+HD2400), HD6850, HD7950 (Laptops: GF6150, HD3200, GMA500)
|
|
|
11-15-2012, 04:18 PM
|
#24
|
|
Junior Member
Join Date: Nov 2012
Location: New York
Posts: 24
|
Quote:
Originally Posted by Lonyo
Probably 6+ months.
|
oops. I was actually talking about Intel S3700. wrong thread lol
|
|
|
11-15-2012, 07:02 PM
|
#25
|
|
Senior Member
Join Date: Sep 2010
Posts: 561
|
Quote:
Originally Posted by moonbogg
He says he estimates Haswell to have 10% better performance than Sandy bridge for current software. Does that mean 5% better than Ivy Bridge then? I'm hoping for a beast gaming chip that will ruin my 3930k. I have fear that this won't happen.
|
He mentions branch prediction improvements and memory improvements. Those are probably the two biggest enhancements.
We'll see what Haswell can do with clocks on 22nm since Intel says they opted for execution speed over IC quality on IVB.
It would be nice to see them do something with cache sizes next time. That may at least improve it some more even though that's probably not completely efficient.  Though I did hear Haswell might have 1MB L2 caches?, which would be kind of nice for us gamers.
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 02:10 AM.
|