Go Back   AnandTech Forums > Hardware and Technology > CPUs and Overclocking

Forums
· Hardware and Technology
· CPUs and Overclocking
· Motherboards
· Video Cards and Graphics
· Memory and Storage
· Power Supplies
· Cases & Cooling
· SFF, Notebooks, Pre-Built/Barebones PCs
· Networking
· Peripherals
· General Hardware
· Highly Technical
· Computer Help
· Home Theater PCs
· Consumer Electronics
· Digital and Video Cameras
· Mobile Devices & Gadgets
· Audio/Video & Home Theater
· Software
· Software for Windows
· All Things Apple
· *nix Software
· Operating Systems
· Programming
· PC Gaming
· Console Gaming
· Distributed Computing
· Security
· Social
· Off Topic
· Politics and News
· Discussion Club
· Love and Relationships
· The Garage
· Health and Fitness
· Home and Garden
· Merchandise and Shopping
· For Sale/Trade
· Hot Deals with Free Stuff/Contests
· Black Friday 2014
· Forum Issues
· Technical Forum Issues
· Personal Forum Issues
· Suggestion Box
· Moderator Resources
· Moderator Discussions
   

Closed Thread
 
Thread Tools
Old 02-08-2013, 06:02 AM   #1301
NTMBK
Diamond Member
 
NTMBK's Avatar
 
Join Date: Nov 2011
Posts: 4,756
Default

Quote:
Originally Posted by Abwx View Post
Still trashing AMD without even checking about the said instructions?.

It wasnt adopted because Intel want to maintain its grip on the
instructions sets , nothing else , but they do not mind using some
of thoses instruction after rebranding them with in house names..



http://en.wikipedia.org/wiki/3DNow!
That wasn't the same instruction. 3DNow! had horizontal add for 2 floats, whereas SSE3 had hadd using 4 floats. Totally different instruction- it's like the difference between SSE2 and AVX2. http://msdn.microsoft.com/en-us/libr...v=vs.100).aspx

ShintaiDK- I wasn't saying that AMD's extensions are going to become widely used at all, simply pointing out that they are compatible with the VEX encoding.

EDIT: Although given that AMD have won the next gen console contracts, I at least expect the instructions used in Jaguar to become more widely used in console games and compilers, if not Windows applications. http://semiaccurate.com/assets/uploa...lide-1-728.jpg Not that the most useful one (FMA4) is implemented.
__________________
Quote:
Originally Posted by Maximilian View Post
I like my VRMs how I like my hookers, hot and Taiwanese.

Last edited by NTMBK; 02-08-2013 at 06:07 AM.
NTMBK is offline  
Old 02-08-2013, 06:17 AM   #1302
ShintaiDK
Lifer
 
ShintaiDK's Avatar
 
Join Date: Apr 2012
Location: Copenhagen
Posts: 11,364
Default

Not going so well for Abwx

NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
__________________
Quote:
Originally Posted by Idontcare
Competition is good at driving the pace of innovation, but it is an inefficient mechanism (R&D expenditures summed across a given industry) for generating the innovation.
ShintaiDK is offline  
Old 02-08-2013, 06:22 AM   #1303
NTMBK
Diamond Member
 
NTMBK's Avatar
 
Join Date: Nov 2011
Posts: 4,756
Default

Quote:
Originally Posted by ShintaiDK View Post
Not going so well for Abwx

NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
I'm honestly not sure what the compiler situation would be- Intel would probably give the best results at first, I agree. But I can't see MS being happy to not use all the instructions of the processor that they can when they have a world-class x86-64 compiler and team- it's certainly possible that they would bring out an updated compiler tuned very specifically to the Jaguar cores. Look at the situation with the PS3 and 360- dev tools improved significantly over the course of its life, as the compiler teams got to grips with tuning code for the in-order PowerPC cores. It could honestly go either way, though.
__________________
Quote:
Originally Posted by Maximilian View Post
I like my VRMs how I like my hookers, hot and Taiwanese.
NTMBK is offline  
Old 02-08-2013, 07:04 AM   #1304
Abwx
Diamond Member
 
Join Date: Apr 2011
Posts: 5,000
Default

Quote:
Originally Posted by NTMBK View Post
That wasn't the same instruction. 3DNow! had horizontal add for 2 floats, whereas SSE3 had hadd using 4 floats. Totally different instruction- it's like the difference between SSE2 and AVX2. http://msdn.microsoft.com/en-us/libr...v=vs.100).aspx
My bad for the wording , what i wanted to point it that Intel
used a feature already present in 3Dnow , not that SSE3
has any relevance with 3Dnow , i guess that Shintaidk jumped
eagerly on the interpretation that suits his agenda , though..
Abwx is offline  
Old 02-08-2013, 09:46 AM   #1305
SocketF
Senior Member
 
Join Date: Jun 2006
Posts: 235
Default

Quote:
Originally Posted by ShintaiDK View Post
NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
Concerning the console wins I wonder if these will omit a fast move to AVX256. As the Bulldozer cores, Jaguar is splitting AVX256 instructions into two 128bit pieces. Thus it is better to stick with AVX128 from the beginning.

Because of the shorter VEX-prefix AVX is still a bit better than the SSEx-equivalent instructions, furthermore you can use 3 operands instead of only 2.

Thus it might help PCs with FX-processors a bit. Depending how much effort the publishers will put into the PC-ports.
SocketF is offline  
Old 02-08-2013, 12:12 PM   #1306
Olikan
Golden Member
 
Olikan's Avatar
 
Join Date: Sep 2011
Posts: 1,829
Default

Quote:
Originally Posted by NUSNA_Moebius View Post
Oh? I thought it was an improved Piledriver? LOL......
ooops!

2 years dely
__________________
Quote:
I must be dyslexic, because every time I look at your name I see OilKan!
Olikan is offline  
Old 02-08-2013, 12:32 PM   #1307
ShintaiDK
Lifer
 
ShintaiDK's Avatar
 
Join Date: Apr 2012
Location: Copenhagen
Posts: 11,364
Default

Quote:
Originally Posted by SocketF View Post
Concerning the console wins I wonder if these will omit a fast move to AVX256. As the Bulldozer cores, Jaguar is splitting AVX256 instructions into two 128bit pieces. Thus it is better to stick with AVX128 from the beginning.

Because of the shorter VEX-prefix AVX is still a bit better than the SSEx-equivalent instructions, furthermore you can use 3 operands instead of only 2.

Thus it might help PCs with FX-processors a bit. Depending how much effort the publishers will put into the PC-ports.
There is a benefit, even when using 2 cycles to execute it.

We also first got singlecycle SSE with Core 2. SSE was heavily used before.
__________________
Quote:
Originally Posted by Idontcare
Competition is good at driving the pace of innovation, but it is an inefficient mechanism (R&D expenditures summed across a given industry) for generating the innovation.

Last edited by ShintaiDK; 02-08-2013 at 12:34 PM.
ShintaiDK is offline  
Old 02-08-2013, 02:13 PM   #1308
SocketF
Senior Member
 
Join Date: Jun 2006
Posts: 235
Default

Quote:
Originally Posted by ShintaiDK View Post
There is a benefit, even when using 2 cycles to execute it.
Which one? AMD's changed their part of the GCC compiler to generate AVX128 only, guess why .. yes it is a few % faster ...
Quote:
We also first got singlecycle SSE with Core 2. SSE was heavily used before.
That's not the point, the point is that the consoles probably have Jaguar with AVX128 only, that wont never change now. Intel already has AVX256 single cycle execution today, but that is of no concern. Jaguar is inside, not intel.

Knowing that the console's single-thread performance will be quiet low, due to low clocks, I assume that the console programmers will try to squeeze everything out of the Jaguarcores, i.e. use AVX128 only, not 256.

Last edited by SocketF; 02-08-2013 at 02:53 PM. Reason: IPC -> Performance
SocketF is offline  
Old 02-08-2013, 02:47 PM   #1309
ShintaiDK
Lifer
 
ShintaiDK's Avatar
 
Join Date: Apr 2012
Location: Copenhagen
Posts: 11,364
Default

Quote:
Originally Posted by SocketF View Post
Which one? AMD's changed their part of the GCC compiler to generate AVX128 only, guess why .. yes it is a few % faster ...
That's not the point, the point is that the consoles probably have Jaguar with AVX128 only, that wont never change now. Intel already has AVX256 single cycle execution today, but that is of no concern. Jaguar is inside, not intel.

Knowing that the console's single-thread IPC will be quiet low, due to low clocks, I assume that the console programmers will try to squeeze everything out of the Jaguarcores, i.e. use AVX128 only, not 256.
Haswell is not released yet. So no single cycle 256bit AVX yet.
__________________
Quote:
Originally Posted by Idontcare
Competition is good at driving the pace of innovation, but it is an inefficient mechanism (R&D expenditures summed across a given industry) for generating the innovation.
ShintaiDK is offline  
Old 02-08-2013, 02:50 PM   #1310
Phynaz
Diamond Member
 
Phynaz's Avatar
 
Join Date: Mar 2006
Posts: 6,689
Default

Quote:
Originally Posted by SocketF View Post
Knowing that the console's single-thread IPC will be quiet low, due to low clocks
Me thinks you are confused.
Phynaz is offline  
Old 02-08-2013, 02:52 PM   #1311
SocketF
Senior Member
 
Join Date: Jun 2006
Posts: 235
Default

Quote:
Originally Posted by ShintaiDK View Post
Haswell is not released yet. So no single cycle 256bit AVX yet.
Seems you mix up AVX with FMA. AVX is supported since Sandy-Bridge and yes - 256bit in one cycle.

But this is not the topic here anyways.
Quote:
Originally Posted by Phynaz View Post
Me thinks you are confused.
Sorry, I meant performance, you are right that doesnt make sense.
SocketF is offline  
Old 02-08-2013, 03:03 PM   #1312
ShintaiDK
Lifer
 
ShintaiDK's Avatar
 
Join Date: Apr 2012
Location: Copenhagen
Posts: 11,364
Default

Quote:
Originally Posted by SocketF View Post
Seems you mix up AVX with FMA. AVX is supported since Sandy-Bridge and yes - 256bit in one cycle.

But this is not the topic here anyways.

Sorry, I meant performance, you are right that doesnt make sense.
256bit AVX instructions takes 2 cycles on SB/IB. The datapaths on those CPUs are also only 128bit.

If it was singlecycle, the difference would have been huge:
%gain/loss avx256 vs avx128
(negative % indicates loss
positive % indicates gain)

AMD BD Intel SB
410.bwaves -2.34 -1.52
416.gamess -1.11 -0.30
433.milc 0.47 -1.75
434.zeusmp -3.61 0.68
435.gromacs -0.54 -0.38
436.cactusADM -23.56 21.49
437.leslie3d -0.44 1.56
444.namd 0.00 0.00
447.dealII -0.36 -0.23
450.soplex -0.43 -0.29
453.povray 0.50 3.63
454.calculix -8.29 1.38
459.GemsFDTD 2.37 -1.54
465.tonto 0.00 0.00
470.lbm 0.00 0.21
481.wrf -4.80 0.00
482.sphinx3 -10.20 -3.65
SpecINT -3.29 1.01

400.perlbench 0.93 1.47
401.bzip2 0.60 0.00
403.gcc 0.00 0.00
429.mcf 0.00 -0.36
445.gobmk -1.03 0.37
456.hmmer -0.64 0.38
458.sjeng 1.74 0.00
462.libquantum 0.31 0.00
464.h264ref 0.00 0.00
471.omnetpp -1.27 0.00
473.astar 0.00 0.46
483.xalancbmk 0.51 0.00
SpecFP 0.09 0.19
__________________
Quote:
Originally Posted by Idontcare
Competition is good at driving the pace of innovation, but it is an inefficient mechanism (R&D expenditures summed across a given industry) for generating the innovation.

Last edited by ShintaiDK; 02-08-2013 at 03:05 PM.
ShintaiDK is offline  
Old 02-08-2013, 03:14 PM   #1313
Haserath
Senior Member
 
Join Date: Sep 2010
Posts: 749
Default

Quote:
Sandy Bridge can sustain a full 16 single precision FLOP/cycle or 8 double precision FLOP/cycle double the capabilities of Nehalem. This guarantees that software which uses AVX will actually see a substantial performance advantage on Sandy Bridge and should spur faster adoption.
http://www.realworldtech.com/sandy-bridge/6/
?
Haserath is offline  
Old 02-08-2013, 03:23 PM   #1314
Abwx
Diamond Member
 
Join Date: Apr 2011
Posts: 5,000
Default

That s 2 double precision Flops/cycle/core , 2 x 64bit ops.
Abwx is offline  
Old 02-08-2013, 03:47 PM   #1315
SocketF
Senior Member
 
Join Date: Jun 2006
Posts: 235
Default

Quote:
Originally Posted by ShintaiDK View Post
256bit AVX instructions takes 2 cycles on SB/IB. The datapaths on those CPUs are also only 128bit.
I am talking about this:
Quote:
Jaguar is splitting AVX256 instructions into two 128bit pieces
I thought that it is clear from that that the topic is decoding, not execution.

Speaking in general about "2 cycle execution" of any AVX256 instructions does not make sense either, e.g. multiplications take much more cycles then additions.

Some comments to your numbers:

a) Using AVX for SpecINT is "suboptimal" because AVX256 is usable only for FP. AVX256 for INT is called AVX2, in that case you really have to wait for Haswell. So I assume there are some reasons for the bad results, but they have nothing to do with 256bit vs. 128bit, because there is only 128b anyways. Source: http://www.drdobbs.com/tools/intel-a...ctio/231000372

b) For SpecFP: Code never ever consists of pure AVX256 parts. There is lots of other code. Check out the explanation of the y-crunsher program as an example:
Quote:
Q: Why does AVX (v0.5.5) only give about 10% speedup over SSE4.1 (v0.5.4)? Shouldn't it be double the speed?
A: Unlike the majority of compute-intensive applications, y-cruncher does not exclusively use floating-point. As of v0.5.4, only about 30% of a Pi computation is floating-point bound. The remainder of the time is spent on integer operations and stalling on memory access. So cutting that 30% in half yields little overall speedup. Speeding up the code in this manner exposes more memory bottlenecks - which ends up reducing the speedup to only 10%...

Integer operations can be largely be emulated using floating-point (albeit with overhead). But most of the integer work involves carry-propagation, so it is not very vectorizable. For now, integer operations are still faster using the normal integer instructions.
http://www.numberworld.org/y-cruncher/

Conclusion: Only because the scores are not doubled does not mean that the execution units are not doubled as well.

Last edited by SocketF; 02-08-2013 at 03:52 PM.
SocketF is offline  
Old 02-08-2013, 03:50 PM   #1316
SocketF
Senior Member
 
Join Date: Jun 2006
Posts: 235
Default

Quote:
Originally Posted by Abwx View Post
That s 2 double precision Flops/cycle/core , 2 x 64bit ops.
No the numbers were already from one core only. The article is about the sandy bridge architecture not about some sandybridge quad core.
Edit: That is the important part:
Quote:
As Figure 5 above indicates, Sandy Bridge can execute a 256-bit FP multiply, a 256-bit FP add and a 256-bit shuffle every cycle.
Furthermore, there are also SandyBridge Xeons with 8 cores .. think about it what it would mean if you apply your math in that case,too ;-)

Last edited by SocketF; 02-08-2013 at 03:54 PM.
SocketF is offline  
Old 02-08-2013, 04:32 PM   #1317
inf64
Platinum Member
 
inf64's Avatar
 
Join Date: Mar 2011
Posts: 2,075
Default

Are we sure SB/IB can sustain 2x256bit ops per core per cycle? Isn't it limited by the effective L/S BW?

Anyway,Jaguar like SocketF said is inside nextgen consoles(or SR ) and it supports basic AVX1.1. Devs will probably use AVX128 but that's not a big issue since I doubt games would benefit largely from fp 256bit ops anyway. Game code is usually integer heavy and branch heavy so 256bit fp ops are probably useless there(apart from maybe some physics on CPU,but they have GCN core on die anyway which can do better job).
__________________
ShintaiDK:"There will be no APU in PS4 and Xbox720."
ShintaiDK:"No quadchannel either.[in Kaveri]"
CHADBOGA:"Because he[OBR] is a great man."
inf64 is offline  
Old 02-08-2013, 04:42 PM   #1318
Nemesis 1
Banned
 
Join Date: Dec 2006
Posts: 11,379
Default

Quote:
Originally Posted by NTMBK View Post
Actually, AMD are already using VEX encoding - they changed their proposed "SSE5" instructions to match the new encoding, for instance in their FMA4 instructions.
What you guys just want to ignor the facts . AMD has their own prefix they do not have nor will they ever have the vec prefix its intel exclusive that works with intel software hardware only. for auto recompile. or runtime i believe. no more red herrings I spent to much time debating this with the Amd fellow who spread his good cheer on this very forum and got me banned for a few days . Truth always risies to the surface. Of course the general public didn't know about AVX2. I see it as intel old mitois same elements instruction set coupled with harware software for 2x+ performance increases less in cases. But what the hell do i know can't spell and my grammer sucks. no one takes someone that can't spell with poor grammer seriously. Hiding in the open

Last edited by Nemesis 1; 02-08-2013 at 04:44 PM.
Nemesis 1 is offline  
Old 02-08-2013, 04:46 PM   #1319
inf64
Platinum Member
 
inf64's Avatar
 
Join Date: Mar 2011
Posts: 2,075
Default

AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
__________________
ShintaiDK:"There will be no APU in PS4 and Xbox720."
ShintaiDK:"No quadchannel either.[in Kaveri]"
CHADBOGA:"Because he[OBR] is a great man."
inf64 is offline  
Old 02-08-2013, 05:04 PM   #1320
Nemesis 1
Banned
 
Join Date: Dec 2006
Posts: 11,379
Default

Quote:
Originally Posted by inf64 View Post
AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
JUST STOP! I am not talking about the instruction set. THE prefix of vec or vex is an intel exclusive. It works with intel hardware/Software together. AMD does not have this software or hardware . They may have a prefix but its not Vec. .Its for auto recompile i believe at run time not sure.So now your going to say AMD has intel compilers. I know they can run intel compilers but it won't work the same and its legal. This is not the result AMD NV wanted when they complained to FTC about intel compilers . The change intel had to make . to make FTC happy Intel had label the compilars as not performing as well on none Intel products . A big win for intel

Last edited by Nemesis 1; 02-08-2013 at 05:11 PM.
Nemesis 1 is offline  
Old 02-08-2013, 05:06 PM   #1321
Nemesis 1
Banned
 
Join Date: Dec 2006
Posts: 11,379
Default

Quote:
Originally Posted by inf64 View Post
AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
MODS I WANT NO MORE RED HERRINGS BY THIS MAN . The info is freely available and he is not telling the truth as we have all witnessed since 2006

Last edited by Nemesis 1; 02-08-2013 at 05:08 PM.
Nemesis 1 is offline  
Old 02-08-2013, 05:08 PM   #1322
SocketF
Senior Member
 
Join Date: Jun 2006
Posts: 235
Default

As long as you dont use Intel's compiler - there are lots of others- you can use AVX including VEX-prefix-instructions, also on AMD chips. They are 100% compatible.

Soon intel should also provide a compiler option for "slow-AMD-AVX" code. Even if it is slower, it will of course use the VEX-prefix. Prefixes are hardware, you are talking about software.

Funny side note: The y-crunsher programmer mentioned above, stated, that Microsoft's compiler generates better/faster AVX256 code for intel CPUs than intel's compiler ;-)
SocketF is offline  
Old 02-08-2013, 05:29 PM   #1323
inf64
Platinum Member
 
inf64's Avatar
 
Join Date: Mar 2011
Posts: 2,075
Default

Oh man, my fail to even trying to respond to that poster. Won't happen again,let him fall into the hole he dug out himself .

For those who are interested in VEX prefix/coding scheme,wikipedia is the easiest source.
Quote:
History

  • In August 2007, AMD proposed the SSE5 instruction set extension which includes a new coding scheme for instructions with three operands, using an extra byte named DREX intended for the Bulldozer processor core, due to begin production in 2011.[2][3]
  • In March 2008, Intel proposed the AVX instruction set, using the new VEX coding scheme.[4]
  • In August 2008, commentators deplored the expected incompatibility between AMD and Intel instruction sets, and proposed that AMD revise their plans and replace the DREX scheme with the more flexible and extensible VEX scheme.[5]
  • In May 2009, AMD announced a revision of the proposed SSE5 instruction set to make it compatible with the AVX instruction set and the VEX coding scheme. The revised SSE5 is called XOP.[6]
  • January 2011. The AVX instruction set is supported in Intel's Sandy Bridge microprocessor architecture.
  • 2011. The AVX, XOP and FMA4 instruction sets, all using the VEX scheme, are supported in the AMD Bulldozer processor.[7]
  • Unknown date. The FMA3 instruction set, but possibly not FMA4, will be supported in Intel processors.
PS The guy cannot discern what is a compiler ( a piece of software) and what is an instruction coding standard(what VEX is)...
__________________
ShintaiDK:"There will be no APU in PS4 and Xbox720."
ShintaiDK:"No quadchannel either.[in Kaveri]"
CHADBOGA:"Because he[OBR] is a great man."
inf64 is offline  
Old 02-08-2013, 05:34 PM   #1324
Nemesis 1
Banned
 
Join Date: Dec 2006
Posts: 11,379
Default

Quote:
Originally Posted by SocketF View Post
As long as you dont use Intel's compiler - there are lots of others- you can use AVX including VEX-prefix-instructions, also on AMD chips. They are 100% compatible.

Soon intel should also provide a compiler option for "slow-AMD-AVX" code. Even if it is slower, it will of course use the VEX-prefix. Prefixes are hardware, you are talking about software.

Funny side note: The y-crunsher programmer mentioned above, stated, that Microsoft's compiler generates better/faster AVX256 code for intel CPUs than intel's compiler ;-)
NO you can not. you go know and get a link . prefix of vec does nothing on an amd machine , AMD has its own prefix

Last edited by Nemesis 1; 02-08-2013 at 05:38 PM.
Nemesis 1 is offline  
Old 02-08-2013, 05:37 PM   #1325
Nemesis 1
Banned
 
Join Date: Dec 2006
Posts: 11,379
Default

Quote:
Originally Posted by inf64 View Post
Oh man, my fail to even trying to respond to that poster. Won't happen again,let him fall into the hole he dug out himself .

For those who are interested in VEX prefix/coding scheme,wikipedia is the easiest source.


PS The guy cannot discern what is a compiler ( a piece of software) and what is an instruction coding standard(what VEX is)...
Man oh man . YOUR lieing tell the mods I said that it should be a ban unless your lieing

Nemesis, my day; nay, my life is too short to be dealing with whatever crazy you're peddling right now. For thread crapping and yelling like a madman, please take the next week off
-ViRGE

Last edited by ViRGE; 02-08-2013 at 08:49 PM.
Nemesis 1 is offline  
Closed Thread

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -5. The time now is 01:13 AM.


Powered by vBulletin® Version 3.8.7
Copyright ©2000 - 2014, vBulletin Solutions, Inc.