|
|
 |
|
02-08-2013, 05:02 AM
|
#1301
|
|
Golden Member
Join Date: Nov 2011
Posts: 1,657
|
Quote:
Originally Posted by Abwx
Still trashing AMD without even checking about the said instructions?.
It wasnt adopted because Intel want to maintain its grip on the
instructions sets , nothing else , but they do not mind using some
of thoses instruction after rebranding them with in house names..
http://en.wikipedia.org/wiki/3DNow!
|
That wasn't the same instruction. 3DNow! had horizontal add for 2 floats, whereas SSE3 had hadd using 4 floats. Totally different instruction- it's like the difference between SSE2 and AVX2. http://msdn.microsoft.com/en-us/libr...v=vs.100).aspx
ShintaiDK- I wasn't saying that AMD's extensions are going to become widely used at all, simply pointing out that they are compatible with the VEX encoding.
EDIT: Although given that AMD have won the next gen console contracts, I at least expect the instructions used in Jaguar to become more widely used in console games and compilers, if not Windows applications. http://semiaccurate.com/assets/uploa...lide-1-728.jpg Not that the most useful one (FMA4) is implemented.
__________________
Main rig Phenom II X4 960T, 4GB DDR2, XFX HD 7770
Old skool 2 x 3GHz Xeon (Hyperthreaded), 2GB RDRAM, HIS AGP HD4670
Last edited by NTMBK; 02-08-2013 at 05:07 AM.
|
|
|
02-08-2013, 05:17 AM
|
#1302
|
|
Diamond Member
Join Date: Apr 2012
Location: Copenhagen
Posts: 5,581
|
Not going so well for Abwx
NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
__________________
MiniITX
CPU - i5 3570K
Board - Intel DH77DF
SSD - Intel 320 300GB
Memory - G.Skill Ares 2x8GB 1600Mhz
Case - Sugo SG08B with 600W PSU
GPU - Zotac GTX 680 2GB
|
|
|
02-08-2013, 05:22 AM
|
#1303
|
|
Golden Member
Join Date: Nov 2011
Posts: 1,657
|
Quote:
Originally Posted by ShintaiDK
Not going so well for Abwx
NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
|
I'm honestly not sure what the compiler situation would be- Intel would probably give the best results at first, I agree. But I can't see MS being happy to not use all the instructions of the processor that they can when they have a world-class x86-64 compiler and team- it's certainly possible that they would bring out an updated compiler tuned very specifically to the Jaguar cores. Look at the situation with the PS3 and 360- dev tools improved significantly over the course of its life, as the compiler teams got to grips with tuning code for the in-order PowerPC cores. It could honestly go either way, though.
__________________
Main rig Phenom II X4 960T, 4GB DDR2, XFX HD 7770
Old skool 2 x 3GHz Xeon (Hyperthreaded), 2GB RDRAM, HIS AGP HD4670
|
|
|
02-08-2013, 06:04 AM
|
#1304
|
|
Platinum Member
Join Date: Apr 2011
Posts: 2,098
|
Quote:
Originally Posted by NTMBK
|
My bad for the wording , what i wanted to point it that Intel
used a feature already present in 3Dnow , not that SSE3
has any relevance with 3Dnow , i guess that Shintaidk jumped
eagerly on the interpretation that suits his agenda , though..
|
|
|
02-08-2013, 08:46 AM
|
#1305
|
|
Member
Join Date: Jun 2006
Posts: 196
|
Quote:
Originally Posted by ShintaiDK
NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
|
Concerning the console wins I wonder if these will omit a fast move to AVX256. As the Bulldozer cores, Jaguar is splitting AVX256 instructions into two 128bit pieces. Thus it is better to stick with AVX128 from the beginning.
Because of the shorter VEX-prefix AVX is still a bit better than the SSEx-equivalent instructions, furthermore you can use 3 operands instead of only 2.
Thus it might help PCs with FX-processors a bit. Depending how much effort the publishers will put into the PC-ports.
|
|
|
02-08-2013, 11:12 AM
|
#1306
|
|
Golden Member
Join Date: Sep 2011
Posts: 1,663
|
Quote:
Originally Posted by NUSNA_Moebius
Oh? I thought it was an improved Piledriver? LOL......
|
ooops!
2 years dely
__________________
Quote:
I must be dyslexic, because every time I look at your name I see OilKan!
|
|
|
|
02-08-2013, 11:32 AM
|
#1307
|
|
Diamond Member
Join Date: Apr 2012
Location: Copenhagen
Posts: 5,581
|
Quote:
Originally Posted by SocketF
Concerning the console wins I wonder if these will omit a fast move to AVX256. As the Bulldozer cores, Jaguar is splitting AVX256 instructions into two 128bit pieces. Thus it is better to stick with AVX128 from the beginning.
Because of the shorter VEX-prefix AVX is still a bit better than the SSEx-equivalent instructions, furthermore you can use 3 operands instead of only 2.
Thus it might help PCs with FX-processors a bit. Depending how much effort the publishers will put into the PC-ports.
|
There is a benefit, even when using 2 cycles to execute it.
We also first got singlecycle SSE with Core 2. SSE was heavily used before.
__________________
MiniITX
CPU - i5 3570K
Board - Intel DH77DF
SSD - Intel 320 300GB
Memory - G.Skill Ares 2x8GB 1600Mhz
Case - Sugo SG08B with 600W PSU
GPU - Zotac GTX 680 2GB
Last edited by ShintaiDK; 02-08-2013 at 11:34 AM.
|
|
|
02-08-2013, 01:13 PM
|
#1308
|
|
Member
Join Date: Jun 2006
Posts: 196
|
Quote:
Originally Posted by ShintaiDK
There is a benefit, even when using 2 cycles to execute it.
|
Which one? AMD's changed their part of the GCC compiler to generate AVX128 only, guess why .. yes it is a few % faster ...
Quote:
|
We also first got singlecycle SSE with Core 2. SSE was heavily used before.
|
That's not the point, the point is that the consoles probably have Jaguar with AVX128 only, that wont never change now. Intel already has AVX256 single cycle execution today, but that is of no concern. Jaguar is inside, not intel.
Knowing that the console's single-thread performance will be quiet low, due to low clocks, I assume that the console programmers will try to squeeze everything out of the Jaguarcores, i.e. use AVX128 only, not 256.
Last edited by SocketF; 02-08-2013 at 01:53 PM.
Reason: IPC -> Performance
|
|
|
02-08-2013, 01:47 PM
|
#1309
|
|
Diamond Member
Join Date: Apr 2012
Location: Copenhagen
Posts: 5,581
|
Quote:
Originally Posted by SocketF
Which one? AMD's changed their part of the GCC compiler to generate AVX128 only, guess why .. yes it is a few % faster ...
That's not the point, the point is that the consoles probably have Jaguar with AVX128 only, that wont never change now. Intel already has AVX256 single cycle execution today, but that is of no concern. Jaguar is inside, not intel.
Knowing that the console's single-thread IPC will be quiet low, due to low clocks, I assume that the console programmers will try to squeeze everything out of the Jaguarcores, i.e. use AVX128 only, not 256.
|
Haswell is not released yet. So no single cycle 256bit AVX yet.
__________________
MiniITX
CPU - i5 3570K
Board - Intel DH77DF
SSD - Intel 320 300GB
Memory - G.Skill Ares 2x8GB 1600Mhz
Case - Sugo SG08B with 600W PSU
GPU - Zotac GTX 680 2GB
|
|
|
02-08-2013, 01:50 PM
|
#1310
|
|
Diamond Member
Join Date: Mar 2006
Posts: 5,165
|
Quote:
Originally Posted by SocketF
Knowing that the console's single-thread IPC will be quiet low, due to low clocks
|
Me thinks you are confused.
|
|
|
02-08-2013, 01:52 PM
|
#1311
|
|
Member
Join Date: Jun 2006
Posts: 196
|
Quote:
Originally Posted by ShintaiDK
Haswell is not released yet. So no single cycle 256bit AVX yet.
|
Seems you mix up AVX with FMA. AVX is supported since Sandy-Bridge and yes - 256bit in one cycle.
But this is not the topic here anyways.
Quote:
Originally Posted by Phynaz
Me thinks you are confused.
|
Sorry, I meant performance, you are right that doesnt make sense.
|
|
|
02-08-2013, 02:03 PM
|
#1312
|
|
Diamond Member
Join Date: Apr 2012
Location: Copenhagen
Posts: 5,581
|
Quote:
Originally Posted by SocketF
Seems you mix up AVX with FMA. AVX is supported since Sandy-Bridge and yes - 256bit in one cycle.
But this is not the topic here anyways.
Sorry, I meant performance, you are right that doesnt make sense.
|
256bit AVX instructions takes 2 cycles on SB/IB. The datapaths on those CPUs are also only 128bit.
If it was singlecycle, the difference would have been huge:
%gain/loss avx256 vs avx128
(negative % indicates loss
positive % indicates gain)
AMD BD Intel SB
410.bwaves -2.34 -1.52
416.gamess -1.11 -0.30
433.milc 0.47 -1.75
434.zeusmp -3.61 0.68
435.gromacs -0.54 -0.38
436.cactusADM -23.56 21.49
437.leslie3d -0.44 1.56
444.namd 0.00 0.00
447.dealII -0.36 -0.23
450.soplex -0.43 -0.29
453.povray 0.50 3.63
454.calculix -8.29 1.38
459.GemsFDTD 2.37 -1.54
465.tonto 0.00 0.00
470.lbm 0.00 0.21
481.wrf -4.80 0.00
482.sphinx3 -10.20 -3.65
SpecINT -3.29 1.01
400.perlbench 0.93 1.47
401.bzip2 0.60 0.00
403.gcc 0.00 0.00
429.mcf 0.00 -0.36
445.gobmk -1.03 0.37
456.hmmer -0.64 0.38
458.sjeng 1.74 0.00
462.libquantum 0.31 0.00
464.h264ref 0.00 0.00
471.omnetpp -1.27 0.00
473.astar 0.00 0.46
483.xalancbmk 0.51 0.00
SpecFP 0.09 0.19
__________________
MiniITX
CPU - i5 3570K
Board - Intel DH77DF
SSD - Intel 320 300GB
Memory - G.Skill Ares 2x8GB 1600Mhz
Case - Sugo SG08B with 600W PSU
GPU - Zotac GTX 680 2GB
Last edited by ShintaiDK; 02-08-2013 at 02:05 PM.
|
|
|
02-08-2013, 02:14 PM
|
#1313
|
|
Senior Member
Join Date: Sep 2010
Posts: 537
|
Quote:
|
Sandy Bridge can sustain a full 16 single precision FLOP/cycle or 8 double precision FLOP/cycle – double the capabilities of Nehalem. This guarantees that software which uses AVX will actually see a substantial performance advantage on Sandy Bridge and should spur faster adoption.
|
http://www.realworldtech.com/sandy-bridge/6/
?
|
|
|
02-08-2013, 02:23 PM
|
#1314
|
|
Platinum Member
Join Date: Apr 2011
Posts: 2,098
|
That s 2 double precision Flops/cycle/core , 2 x 64bit ops.
|
|
|
02-08-2013, 02:47 PM
|
#1315
|
|
Member
Join Date: Jun 2006
Posts: 196
|
Quote:
Originally Posted by ShintaiDK
256bit AVX instructions takes 2 cycles on SB/IB. The datapaths on those CPUs are also only 128bit.
|
I am talking about this:
Quote:
|
Jaguar is splitting AVX256 instructions into two 128bit pieces
|
I thought that it is clear from that that the topic is decoding, not execution.
Speaking in general about "2 cycle execution" of any AVX256 instructions does not make sense either, e.g. multiplications take much more cycles then additions.
Some comments to your numbers:
a) Using AVX for SpecINT is "suboptimal" because AVX256 is usable only for FP. AVX256 for INT is called AVX2, in that case you really have to wait for Haswell. So I assume there are some reasons for the bad results, but they have nothing to do with 256bit vs. 128bit, because there is only 128b anyways. Source: http://www.drdobbs.com/tools/intel-a...ctio/231000372
b) For SpecFP: Code never ever consists of pure AVX256 parts. There is lots of other code. Check out the explanation of the y-crunsher program as an example:
Quote:
Q: Why does AVX (v0.5.5) only give about 10% speedup over SSE4.1 (v0.5.4)? Shouldn't it be double the speed?
A: Unlike the majority of compute-intensive applications, y-cruncher does not exclusively use floating-point. As of v0.5.4, only about 30% of a Pi computation is floating-point bound. The remainder of the time is spent on integer operations and stalling on memory access. So cutting that 30% in half yields little overall speedup. Speeding up the code in this manner exposes more memory bottlenecks - which ends up reducing the speedup to only 10%...
Integer operations can be largely be emulated using floating-point (albeit with overhead). But most of the integer work involves carry-propagation, so it is not very vectorizable. For now, integer operations are still faster using the normal integer instructions.
|
http://www.numberworld.org/y-cruncher/
Conclusion: Only because the scores are not doubled does not mean that the execution units are not doubled as well.
Last edited by SocketF; 02-08-2013 at 02:52 PM.
|
|
|
02-08-2013, 02:50 PM
|
#1316
|
|
Member
Join Date: Jun 2006
Posts: 196
|
Quote:
Originally Posted by Abwx
That s 2 double precision Flops/cycle/core , 2 x 64bit ops.
|
No the numbers were already from one core only. The article is about the sandy bridge architecture not about some sandybridge quad core.
Edit: That is the important part:
Quote:
|
As Figure 5 above indicates, Sandy Bridge can execute a 256-bit FP multiply, a 256-bit FP add and a 256-bit shuffle every cycle.
|
Furthermore, there are also SandyBridge Xeons with 8 cores .. think about it what it would mean if you apply your math in that case,too ;-)
Last edited by SocketF; 02-08-2013 at 02:54 PM.
|
|
|
02-08-2013, 03:32 PM
|
#1317
|
|
Golden Member
Join Date: Mar 2011
Posts: 1,276
|
Are we sure SB/IB can sustain 2x256bit ops per core per cycle? Isn't it limited by the effective L/S BW?
Anyway,Jaguar like SocketF said is inside nextgen consoles(or SR  ) and it supports basic AVX1.1. Devs will probably use AVX128 but that's not a big issue since I doubt games would benefit largely from fp 256bit ops anyway. Game code is usually integer heavy and branch heavy so 256bit fp ops are probably useless there(apart from maybe some physics on CPU,but they have GCN core on die anyway which can do better job).
__________________
Quote:
Originally Posted by ShintaiDK
There will be no APU in PS4 and Xbox720.
|
CHADBOGA:" Because he[OBR] is a great man."
|
|
|
02-08-2013, 03:42 PM
|
#1318
|
|
Banned
Join Date: Dec 2006
Posts: 11,379
|
Quote:
Originally Posted by NTMBK
Actually, AMD are already using VEX encoding - they changed their proposed "SSE5" instructions to match the new encoding, for instance in their FMA4 instructions.
|
What you guys just want to ignor the facts . AMD has their own prefix they do not have nor will they ever have the vec prefix its intel exclusive that works with intel software hardware only. for auto recompile. or runtime i believe. no more red herrings I spent to much time debating this with the Amd fellow who spread his good cheer on this very forum and got me banned for a few days . Truth always risies to the surface. Of course the general public didn't know about AVX2. I see it as intel old mitois same elements instruction set coupled with harware software for 2x+ performance increases less in cases. But what the hell do i know can't spell and my grammer sucks. no one takes someone that can't spell with poor grammer seriously. Hiding in the open
Last edited by Nemesis 1; 02-08-2013 at 03:44 PM.
|
|
|
02-08-2013, 03:46 PM
|
#1319
|
|
Golden Member
Join Date: Mar 2011
Posts: 1,276
|
AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
__________________
Quote:
Originally Posted by ShintaiDK
There will be no APU in PS4 and Xbox720.
|
CHADBOGA:" Because he[OBR] is a great man."
|
|
|
02-08-2013, 04:04 PM
|
#1320
|
|
Banned
Join Date: Dec 2006
Posts: 11,379
|
Quote:
Originally Posted by inf64
AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
|
JUST STOP! I am not talking about the instruction set. THE prefix of vec or vex is an intel exclusive. It works with intel hardware/Software together. AMD does not have this software or hardware . They may have a prefix but its not Vec. .Its for auto recompile i believe at run time not sure.So now your going to say AMD has intel compilers. I know they can run intel compilers but it won't work the same and its legal. This is not the result AMD NV wanted when they complained to FTC about intel compilers . The change intel had to make . to make FTC happy Intel had label the compilars as not performing as well on none Intel products . A big win for intel
Last edited by Nemesis 1; 02-08-2013 at 04:11 PM.
|
|
|
02-08-2013, 04:06 PM
|
#1321
|
|
Banned
Join Date: Dec 2006
Posts: 11,379
|
Quote:
Originally Posted by inf64
AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
|
MODS I WANT NO MORE RED HERRINGS BY THIS MAN . The info is freely available and he is not telling the truth as we have all witnessed since 2006
Last edited by Nemesis 1; 02-08-2013 at 04:08 PM.
|
|
|
02-08-2013, 04:08 PM
|
#1322
|
|
Member
Join Date: Jun 2006
Posts: 196
|
As long as you dont use Intel's compiler - there are lots of others- you can use AVX including VEX-prefix-instructions, also on AMD chips. They are 100% compatible.
Soon intel should also provide a compiler option for "slow-AMD-AVX" code. Even if it is slower, it will of course use the VEX-prefix. Prefixes are hardware, you are talking about software.
Funny side note: The y-crunsher programmer mentioned above, stated, that Microsoft's compiler generates better/faster AVX256 code for intel CPUs than intel's compiler ;-)
|
|
|
02-08-2013, 04:29 PM
|
#1323
|
|
Golden Member
Join Date: Mar 2011
Posts: 1,276
|
Oh man, my fail to even trying to respond to that poster. Won't happen again,let him fall into the hole he dug out himself  .
For those who are interested in VEX prefix/coding scheme, wikipedia is the easiest source.
Quote:
History
- In August 2007, AMD proposed the SSE5 instruction set extension which includes a new coding scheme for instructions with three operands, using an extra byte named DREX intended for the Bulldozer processor core, due to begin production in 2011.[2][3]
- In March 2008, Intel proposed the AVX instruction set, using the new VEX coding scheme.[4]
- In August 2008, commentators deplored the expected incompatibility between AMD and Intel instruction sets, and proposed that AMD revise their plans and replace the DREX scheme with the more flexible and extensible VEX scheme.[5]
- In May 2009, AMD announced a revision of the proposed SSE5 instruction set to make it compatible with the AVX instruction set and the VEX coding scheme. The revised SSE5 is called XOP.[6]
- January 2011. The AVX instruction set is supported in Intel's Sandy Bridge microprocessor architecture.
- 2011. The AVX, XOP and FMA4 instruction sets, all using the VEX scheme, are supported in the AMD Bulldozer processor.[7]
- Unknown date. The FMA3 instruction set, but possibly not FMA4, will be supported in Intel processors.
|
PS The guy cannot discern what is a compiler ( a piece of software) and what is an instruction coding standard(what VEX is)...
__________________
Quote:
Originally Posted by ShintaiDK
There will be no APU in PS4 and Xbox720.
|
CHADBOGA:" Because he[OBR] is a great man."
|
|
|
02-08-2013, 04:34 PM
|
#1324
|
|
Banned
Join Date: Dec 2006
Posts: 11,379
|
Quote:
Originally Posted by SocketF
As long as you dont use Intel's compiler - there are lots of others- you can use AVX including VEX-prefix-instructions, also on AMD chips. They are 100% compatible.
Soon intel should also provide a compiler option for "slow-AMD-AVX" code. Even if it is slower, it will of course use the VEX-prefix. Prefixes are hardware, you are talking about software.
Funny side note: The y-crunsher programmer mentioned above, stated, that Microsoft's compiler generates better/faster AVX256 code for intel CPUs than intel's compiler ;-)
|
NO you can not. you go know and get a link . prefix of vec does nothing on an amd machine , AMD has its own prefix
Last edited by Nemesis 1; 02-08-2013 at 04:38 PM.
|
|
|
02-08-2013, 04:37 PM
|
#1325
|
|
Banned
Join Date: Dec 2006
Posts: 11,379
|
Quote:
Originally Posted by inf64
Oh man, my fail to even trying to respond to that poster. Won't happen again,let him fall into the hole he dug out himself  .
For those who are interested in VEX prefix/coding scheme, wikipedia is the easiest source.
PS The guy cannot discern what is a compiler ( a piece of software) and what is an instruction coding standard(what VEX is)...
|
Man oh man . YOUR lieing tell the mods I said that it should be a ban unless your lieing
Nemesis, my day; nay, my life is too short to be dealing with whatever crazy you're peddling right now. For thread crapping and yelling like a madman, please take the next week off
-ViRGE
Last edited by ViRGE; 02-08-2013 at 07:49 PM.
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 04:21 AM.
|