Richland & Kabini rumours

Discussion in 'CPUs and Overclocking' started by dbcoopernz, Dec 27, 2012.

Thread Status:
Not open for further replies.
  1. Abwx

    Abwx Diamond Member

    Joined:
    Apr 2, 2011
    Messages:
    8,301
    Likes Received:
    174
    Still trashing AMD without even checking about the said instructions?.

    It wasnt adopted because Intel want to maintain its grip on the
    instructions sets , nothing else , but they do not mind using some
    of thoses instruction after rebranding them with in house names..

    http://en.wikipedia.org/wiki/3DNow!
     
  2. ShintaiDK

    ShintaiDK Lifer

    Joined:
    Apr 22, 2012
    Messages:
    20,395
    Likes Received:
    128
    3DNow! never became universal. So its completely irrelevant. Just like SSE4A and SSE5.
     
  3. NTMBK

    NTMBK Diamond Member

    Joined:
    Nov 14, 2011
    Messages:
    7,521
    Likes Received:
    364
    That wasn't the same instruction. 3DNow! had horizontal add for 2 floats, whereas SSE3 had hadd using 4 floats. Totally different instruction- it's like the difference between SSE2 and AVX2. http://msdn.microsoft.com/en-us/library/yd9wecaa(v=vs.100).aspx

    ShintaiDK- I wasn't saying that AMD's extensions are going to become widely used at all, simply pointing out that they are compatible with the VEX encoding.

    EDIT: Although given that AMD have won the next gen console contracts, I at least expect the instructions used in Jaguar to become more widely used in console games and compilers, if not Windows applications. http://semiaccurate.com/assets/uploads/2012/08/slide-1-728.jpg Not that the most useful one (FMA4) is implemented.
     
    #1303 NTMBK, Feb 8, 2013
    Last edited: Feb 8, 2013
  4. ShintaiDK

    ShintaiDK Lifer

    Joined:
    Apr 22, 2012
    Messages:
    20,395
    Likes Received:
    128
    Not going so well for Abwx ;)

    NTMBK: But since Intel compilers already generate the fastest code for AMD CPUs. And jaguar, assuming it will be used for consoles, also support AVX. And if you wish to port games as well, to a PC segment that outgrows consoles. Why use anything but universal instructions and the fastest code generating compiler there is?
     
  5. NTMBK

    NTMBK Diamond Member

    Joined:
    Nov 14, 2011
    Messages:
    7,521
    Likes Received:
    364
    I'm honestly not sure what the compiler situation would be- Intel would probably give the best results at first, I agree. But I can't see MS being happy to not use all the instructions of the processor that they can when they have a world-class x86-64 compiler and team- it's certainly possible that they would bring out an updated compiler tuned very specifically to the Jaguar cores. Look at the situation with the PS3 and 360- dev tools improved significantly over the course of its life, as the compiler teams got to grips with tuning code for the in-order PowerPC cores. It could honestly go either way, though.
     
  6. Abwx

    Abwx Diamond Member

    Joined:
    Apr 2, 2011
    Messages:
    8,301
    Likes Received:
    174
    My bad for the wording , what i wanted to point it that Intel
    used a feature already present in 3Dnow , not that SSE3
    has any relevance with 3Dnow , i guess that Shintaidk jumped
    eagerly on the interpretation that suits his agenda , though..
     
  7. SocketF

    SocketF Senior member

    Joined:
    Jun 2, 2006
    Messages:
    236
    Likes Received:
    0
    Concerning the console wins I wonder if these will omit a fast move to AVX256. As the Bulldozer cores, Jaguar is splitting AVX256 instructions into two 128bit pieces. Thus it is better to stick with AVX128 from the beginning.

    Because of the shorter VEX-prefix AVX is still a bit better than the SSEx-equivalent instructions, furthermore you can use 3 operands instead of only 2.

    Thus it might help PCs with FX-processors a bit. Depending how much effort the publishers will put into the PC-ports.
     
  8. Olikan

    Olikan Golden Member

    Joined:
    Sep 23, 2011
    Messages:
    1,847
    Likes Received:
    7
    ooops!

    2 years dely :eek:
     
  9. ShintaiDK

    ShintaiDK Lifer

    Joined:
    Apr 22, 2012
    Messages:
    20,395
    Likes Received:
    128
    There is a benefit, even when using 2 cycles to execute it.

    We also first got singlecycle SSE with Core 2. SSE was heavily used before.
     
    #1309 ShintaiDK, Feb 8, 2013
    Last edited: Feb 8, 2013
  10. SocketF

    SocketF Senior member

    Joined:
    Jun 2, 2006
    Messages:
    236
    Likes Received:
    0
    Which one? AMD's changed their part of the GCC compiler to generate AVX128 only, guess why .. yes it is a few % faster ...
    That's not the point, the point is that the consoles probably have Jaguar with AVX128 only, that wont never change now. Intel already has AVX256 single cycle execution today, but that is of no concern. Jaguar is inside, not intel.

    Knowing that the console's single-thread performance will be quiet low, due to low clocks, I assume that the console programmers will try to squeeze everything out of the Jaguarcores, i.e. use AVX128 only, not 256.
     
    #1310 SocketF, Feb 8, 2013
    Last edited: Feb 8, 2013
  11. ShintaiDK

    ShintaiDK Lifer

    Joined:
    Apr 22, 2012
    Messages:
    20,395
    Likes Received:
    128
    Haswell is not released yet. So no single cycle 256bit AVX yet.
     
  12. Phynaz

    Phynaz Diamond Member

    Joined:
    Mar 13, 2006
    Messages:
    9,731
    Likes Received:
    523
    Me thinks you are confused.
     
  13. SocketF

    SocketF Senior member

    Joined:
    Jun 2, 2006
    Messages:
    236
    Likes Received:
    0
    Seems you mix up AVX with FMA. AVX is supported since Sandy-Bridge and yes - 256bit in one cycle.

    But this is not the topic here anyways.
    Sorry, I meant performance, you are right that doesnt make sense.
     
  14. ShintaiDK

    ShintaiDK Lifer

    Joined:
    Apr 22, 2012
    Messages:
    20,395
    Likes Received:
    128
    256bit AVX instructions takes 2 cycles on SB/IB. The datapaths on those CPUs are also only 128bit.

    If it was singlecycle, the difference would have been huge:
    %gain/loss avx256 vs avx128
    (negative % indicates loss
    positive % indicates gain)

    AMD BD Intel SB
    410.bwaves -2.34 -1.52
    416.gamess -1.11 -0.30
    433.milc 0.47 -1.75
    434.zeusmp -3.61 0.68
    435.gromacs -0.54 -0.38
    436.cactusADM -23.56 21.49
    437.leslie3d -0.44 1.56
    444.namd 0.00 0.00
    447.dealII -0.36 -0.23
    450.soplex -0.43 -0.29
    453.povray 0.50 3.63
    454.calculix -8.29 1.38
    459.GemsFDTD 2.37 -1.54
    465.tonto 0.00 0.00
    470.lbm 0.00 0.21
    481.wrf -4.80 0.00
    482.sphinx3 -10.20 -3.65
    SpecINT -3.29 1.01

    400.perlbench 0.93 1.47
    401.bzip2 0.60 0.00
    403.gcc 0.00 0.00
    429.mcf 0.00 -0.36
    445.gobmk -1.03 0.37
    456.hmmer -0.64 0.38
    458.sjeng 1.74 0.00
    462.libquantum 0.31 0.00
    464.h264ref 0.00 0.00
    471.omnetpp -1.27 0.00
    473.astar 0.00 0.46
    483.xalancbmk 0.51 0.00
    SpecFP 0.09 0.19
     
    #1314 ShintaiDK, Feb 8, 2013
    Last edited: Feb 8, 2013
  15. Haserath

    Haserath Senior member

    Joined:
    Sep 12, 2010
    Messages:
    794
    Likes Received:
    1
    http://www.realworldtech.com/sandy-bridge/6/
    ?
     
  16. Abwx

    Abwx Diamond Member

    Joined:
    Apr 2, 2011
    Messages:
    8,301
    Likes Received:
    174
    That s 2 double precision Flops/cycle/core , 2 x 64bit ops.
     
  17. SocketF

    SocketF Senior member

    Joined:
    Jun 2, 2006
    Messages:
    236
    Likes Received:
    0
    I am talking about this:
    I thought that it is clear from that that the topic is decoding, not execution.

    Speaking in general about "2 cycle execution" of any AVX256 instructions does not make sense either, e.g. multiplications take much more cycles then additions.

    Some comments to your numbers:

    a) Using AVX for SpecINT is "suboptimal" because AVX256 is usable only for FP. AVX256 for INT is called AVX2, in that case you really have to wait for Haswell. So I assume there are some reasons for the bad results, but they have nothing to do with 256bit vs. 128bit, because there is only 128b anyways. Source: http://www.drdobbs.com/tools/intel-avx2-will-bring-integer-instructio/231000372

    b) For SpecFP: Code never ever consists of pure AVX256 parts. There is lots of other code. Check out the explanation of the y-crunsher program as an example:
    http://www.numberworld.org/y-cruncher/

    Conclusion: Only because the scores are not doubled does not mean that the execution units are not doubled as well.
     
    #1317 SocketF, Feb 8, 2013
    Last edited: Feb 8, 2013
  18. SocketF

    SocketF Senior member

    Joined:
    Jun 2, 2006
    Messages:
    236
    Likes Received:
    0
    No the numbers were already from one core only. The article is about the sandy bridge architecture not about some sandybridge quad core.
    Edit: That is the important part:
    Furthermore, there are also SandyBridge Xeons with 8 cores .. think about it what it would mean if you apply your math in that case,too ;-)
     
    #1318 SocketF, Feb 8, 2013
    Last edited: Feb 8, 2013
  19. inf64

    inf64 Platinum Member

    Joined:
    Mar 11, 2011
    Messages:
    2,779
    Likes Received:
    1,043
    Are we sure SB/IB can sustain 2x256bit ops per core per cycle? Isn't it limited by the effective L/S BW?

    Anyway,Jaguar like SocketF said is inside nextgen consoles(or SR :D) and it supports basic AVX1.1. Devs will probably use AVX128 but that's not a big issue since I doubt games would benefit largely from fp 256bit ops anyway. Game code is usually integer heavy and branch heavy so 256bit fp ops are probably useless there(apart from maybe some physics on CPU,but they have GCN core on die anyway which can do better job).
     
  20. Nemesis 1

    Nemesis 1 Lifer

    Joined:
    Dec 30, 2006
    Messages:
    11,379
    Likes Received:
    0
    What you guys just want to ignor the facts . AMD has their own prefix they do not have nor will they ever have the vec prefix its intel exclusive that works with intel software hardware only. for auto recompile. or runtime i believe. no more red herrings I spent to much time debating this with the Amd fellow who spread his good cheer on this very forum and got me banned for a few days . Truth always risies to the surface. Of course the general public didn't know about AVX2. I see it as intel old mitois same elements instruction set coupled with harware software for 2x+ performance increases less in cases. But what the hell do i know can't spell and my grammer sucks. no one takes someone that can't spell with poor grammer seriously. Hiding in the open
     
    #1320 Nemesis 1, Feb 8, 2013
    Last edited: Feb 8, 2013
  21. inf64

    inf64 Platinum Member

    Joined:
    Mar 11, 2011
    Messages:
    2,779
    Likes Received:
    1,043
    AMD Supports the same VEX encoding as intel does... How do you think it can run the AVX code,by magic pixie dust? What AMD does different is the encoding for their own proprietary XOP ISA,a unique media instructions they have built.
     
  22. Nemesis 1

    Nemesis 1 Lifer

    Joined:
    Dec 30, 2006
    Messages:
    11,379
    Likes Received:
    0
    JUST STOP! I am not talking about the instruction set. THE prefix of vec or vex is an intel exclusive. It works with intel hardware/Software together. AMD does not have this software or hardware . They may have a prefix but its not Vec. .Its for auto recompile i believe at run time not sure.So now your going to say AMD has intel compilers. I know they can run intel compilers but it won't work the same and its legal. This is not the result AMD NV wanted when they complained to FTC about intel compilers . The change intel had to make . to make FTC happy Intel had label the compilars as not performing as well on none Intel products . A big win for intel
     
    #1322 Nemesis 1, Feb 8, 2013
    Last edited: Feb 8, 2013
  23. Nemesis 1

    Nemesis 1 Lifer

    Joined:
    Dec 30, 2006
    Messages:
    11,379
    Likes Received:
    0
    MODS I WANT NO MORE RED HERRINGS BY THIS MAN . The info is freely available and he is not telling the truth as we have all witnessed since 2006
     
    #1323 Nemesis 1, Feb 8, 2013
    Last edited: Feb 8, 2013
  24. SocketF

    SocketF Senior member

    Joined:
    Jun 2, 2006
    Messages:
    236
    Likes Received:
    0
    As long as you dont use Intel's compiler - there are lots of others- you can use AVX including VEX-prefix-instructions, also on AMD chips. They are 100% compatible.

    Soon intel should also provide a compiler option for "slow-AMD-AVX" code. Even if it is slower, it will of course use the VEX-prefix. Prefixes are hardware, you are talking about software.

    Funny side note: The y-crunsher programmer mentioned above, stated, that Microsoft's compiler generates better/faster AVX256 code for intel CPUs than intel's compiler ;-)
     
  25. inf64

    inf64 Platinum Member

    Joined:
    Mar 11, 2011
    Messages:
    2,779
    Likes Received:
    1,043
    Oh man, my fail to even trying to respond to that poster. Won't happen again,let him fall into the hole he dug out himself :).

    For those who are interested in VEX prefix/coding scheme,wikipedia is the easiest source.
    PS The guy cannot discern what is a compiler ( a piece of software) and what is an instruction coding standard(what VEX is)...
     
Thread Status:
Not open for further replies.