Ryzen: Strictly technical

Discussion in 'CPUs and Overclocking' started by The Stilt, Mar 2, 2017.

  1. Insert_Nickname

    Insert_Nickname Diamond Member

    Joined:
    May 6, 2012
    Messages:
    3,143
    Likes Received:
    203
    My 1700non-X gets ~12300MB/s read and the same write in AIDA64 GPGPU benchmark.
     
    thigobr likes this.
  2. looncraz

    looncraz Senior member

    Joined:
    Sep 12, 2011
    Messages:
    715
    Likes Received:
    1,634
    Windows load-balances the cores, so the heavy-hitter threads are being moved around between differing cores (but not the SMT thread on the same core) every 10ms or so (Windows kernel scheduling interrupt interval). As was mentioned, you're seeing an average over 0.5 second or more, so it will appear that no core is being fully utilized - but they are... momentarily.

    This process, though, makes a few issues with Ryzen.

    1. It effectively prevents 'AI' prefetch adaptation
    - so 10~15% of its total performance is lost right there (if AMD is to be believed)

    2. It shuffles data across CCXes about 50% of the time.
    - This damages data locality and causes new fetches from memory.

    3a. A driver may detect this cache behavior and then load up VRAM to the max for better performance...
    OR
    3b. nVidia is intentionally loading more data on AMD CPUs... for whatever reason.
     
  3. Valantar

    Valantar Golden Member

    Joined:
    Aug 26, 2014
    Messages:
    1,722
    Likes Received:
    436
    Interesting. In my eyes that's a change for the better, as it makes the terminology clearer. But you're saying this isn't officially supported? So there's no chance we'll see motherboards allowing us to run, say, a 1700 at 35W for UCFF-style builds? At least it might bode well for later low-power skus, I suppose, especially in Raven Ridge.
     
  4. Harney

    Harney Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    3
    Likes Received:
    1
    Agree Stilts work is better than most reviews i have seen too

    Thanks Stilt for this great work ....long live win 7
     
  5. starheap

    starheap Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    5
    Likes Received:
    0
    Here is my Coreinfo output on windows 10 with an 1800x

    Code:
    Logical to Physical Processor Map:
    **--------------  Physical Processor 0 (Hyperthreaded)
    --**------------  Physical Processor 1 (Hyperthreaded)
    ----**----------  Physical Processor 2 (Hyperthreaded)
    ------**--------  Physical Processor 3 (Hyperthreaded)
    --------**------  Physical Processor 4 (Hyperthreaded)
    ----------**----  Physical Processor 5 (Hyperthreaded)
    ------------**--  Physical Processor 6 (Hyperthreaded)
    --------------**  Physical Processor 7 (Hyperthreaded)
    
    Logical Processor to Socket Map:
    ****************  Socket 0
    
    Logical Processor to NUMA Node Map:
    ****************  NUMA Node 0
    
    No NUMA nodes.
    
    Logical Processor to Cache Map:
    **--------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
    **--------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
    **--------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
    ********--------  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
    --**------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
    --**------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
    --**------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
    ----**----------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
    ----**----------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
    ----**----------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
    ------**--------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
    ------**--------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
    ------**--------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
    --------**------  Data Cache          4, Level 1,   32 KB, Assoc   8, LineSize  64
    --------**------  Instruction Cache   4, Level 1,   64 KB, Assoc   4, LineSize  64
    --------**------  Unified Cache       5, Level 2,  512 KB, Assoc   8, LineSize  64
    --------********  Unified Cache       6, Level 3,    8 MB, Assoc  16, LineSize  64
    ----------**----  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
    ----------**----  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
    ----------**----  Unified Cache       7, Level 2,  512 KB, Assoc   8, LineSize  64
    ------------**--  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
    ------------**--  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
    ------------**--  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
    --------------**  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
    --------------**  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
    --------------**  Unified Cache       9, Level 2,  512 KB, Assoc   8, LineSize  64
    
     
  6. iBoMbY

    iBoMbY Member

    Joined:
    Nov 23, 2016
    Messages:
    156
    Likes Received:
    84
    Yes, I got mine today as well, and my Coreinfo looks exactly like yours, and it seems to be correct so far (only maybe a NUMA node per CCX could be nice). Seems like they fixed that cache thing with some microcode update?
     
  7. Mockingbird

    Mockingbird Senior member

    Joined:
    Feb 12, 2017
    Messages:
    516
    Likes Received:
    380
    Yours is correct. I wonder why it's different from OP's (The Stilt's).

    Did MSFT released a stealth update to Windows 10?

     
  8. Mockingbird

    Mockingbird Senior member

    Joined:
    Feb 12, 2017
    Messages:
    516
    Likes Received:
    380
    Did MSFT released a stealth update to Windows 10 or something? :eek:
     
  9. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,582
    Likes Received:
    2,545
    I did some 3D testing and eventhou there is not nearly enough data to confirm it, I'd say the SMT regression is infact a Windows 10 related issue.
    In 3D testing I did recently on Windows 10, the title which illustrated the biggest SMT regression was Total War: Warhammer.

    All of these were recorded at 3.5GHz, 2133MHz MEMCLK with R9 Nano:

    Windows 10 - 1080 Ultra DX11:

    8C/16T - 49.39fps (Min), 72.36fps (Avg)
    8C/8T - 57.16fps (Min), 72.46fps (Avg)

    Windows 7 - 1080 Ultra DX11:

    8C/16T - 62.33fps (Min), 78.18fps (Avg)
    8C/8T - 62.00fps (Min), 73.22fps (Avg)

    At the moment this is just pure speculation as there were variables, which could not be isolated.
    Windows 10 figures were recorded using PresentMon (OCAT), however with Windows 7 it was necessary to use Fraps.
     
  10. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,582
    Likes Received:
    2,545
    At least at the moment that is the case.
    However, since the feature can be easily "added" I wouldn't be too surprised if it would become officially available at some point.
     
    lightmanek, Drazick, Valantar and 2 others like this.
  11. Kromaatikse

    Kromaatikse Member

    Joined:
    Mar 4, 2017
    Messages:
    83
    Likes Received:
    169
    Interestingly, my Kaveri APU on Win7 also has an incorrect cache mapping according to coreinfo. Can anyone with Kaveri, Carrizo, Bristol Ridge or Vishera confirm this on Win10?

    Code:
    Coreinfo v3.31 - Dump information on system CPU and memory topology
    Copyright (C) 2008-2014 Mark Russinovich
    Sysinternals - www.sysinternals.com
    
    AMD A10-7850K Radeon R7, 12 Compute Cores 4C+8G
    AMD64 Family 21 Model 48 Stepping 1, AuthenticAMD
    
    <snip>
    
    Maximum implemented CPUID leaves: 0000000D (Basic), 8000001E (Extended).
    
    Logical to Physical Processor Map:
    **--  Physical Processor 0 (Hyperthreaded)
    --**  Physical Processor 1 (Hyperthreaded)
    
    Logical Processor to Socket Map:
    ****  Socket 0
    
    Logical Processor to NUMA Node Map:
    ****  NUMA Node 0
    
    No NUMA nodes.
    
    Logical Processor to Cache Map:
    *---  Data Cache          0, Level 1,   16 KB, Assoc   4, LineSize  64
    *---  Instruction Cache   0, Level 1,   96 KB, Assoc   3, LineSize  64
    *---  Unified Cache       0, Level 2,    2 MB, Assoc  16, LineSize  64
    -*--  Data Cache          1, Level 1,   16 KB, Assoc   4, LineSize  64
    -*--  Instruction Cache   1, Level 1,   96 KB, Assoc   3, LineSize  64
    -*--  Unified Cache       1, Level 2,    2 MB, Assoc  16, LineSize  64
    --*-  Data Cache          2, Level 1,   16 KB, Assoc   4, LineSize  64
    --*-  Instruction Cache   2, Level 1,   96 KB, Assoc   3, LineSize  64
    --*-  Unified Cache       2, Level 2,    2 MB, Assoc  16, LineSize  64
    ---*  Data Cache          3, Level 1,   16 KB, Assoc   4, LineSize  64
    ---*  Instruction Cache   3, Level 1,   96 KB, Assoc   3, LineSize  64
    ---*  Unified Cache       3, Level 2,    2 MB, Assoc  16, LineSize  64
    
    Logical Processor to Group Map:
    ****  Group 0
    
    The correct layout should be something like:

    Code:
    Logical Processor to Cache Map:
    *---  Data Cache          0, Level 1,   16 KB, Assoc   4, LineSize  64
    **--  Instruction Cache   0, Level 1,   96 KB, Assoc   3, LineSize  64
    **--  Unified Cache       0, Level 2,    2 MB, Assoc  16, LineSize  64
    -*--  Data Cache          1, Level 1,   16 KB, Assoc   4, LineSize  64
    --*-  Data Cache          2, Level 1,   16 KB, Assoc   4, LineSize  64
    --**  Instruction Cache   2, Level 1,   96 KB, Assoc   3, LineSize  64
    --**  Unified Cache       2, Level 2,    2 MB, Assoc  16, LineSize  64
    ---*  Data Cache          3, Level 1,   16 KB, Assoc   4, LineSize  64
    
     
  12. imported_jjj

    imported_jjj Senior member

    Joined:
    Feb 14, 2009
    Messages:
    660
    Likes Received:
    430
    I am gonna report your post, for being awesome lol.

    EDIT: you are not getting higher avg FPS with SMT disabled in Win 10.
    Might be better to go lower res, it increases the diff and makes it easier to quantify.
    The memory could be a bottleneck at 2133 considering that the data fabric is running at same clocks , might pollute the data.
     
    #187 imported_jjj, Mar 4, 2017
    Last edited: Mar 4, 2017
  13. PotatoWithEarsOnSide

    Joined:
    Feb 23, 2017
    Messages:
    108
    Likes Received:
    109
    Code:
    Coreinfo v3.31 - Dump information on system CPU and memory topology
    Copyright (C) 2008-2014 Mark Russinovich
    Sysinternals - www.sysinternals.com
    
    AMD A10-7300 Radeon R6, 10 Compute Cores 4C+6G
    AMD64 Family 21 Model 48 Stepping 1, AuthenticAMD
    Microcode signature: 06003106
    
    Maximum implemented CPUID leaves: 0000000D (Basic), 8000001E (Extended).
    
    Logical to Physical Processor Map:
    *---  Physical Processor 0
    -*--  Physical Processor 1
    --*-  Physical Processor 2
    ---*  Physical Processor 3
    
    Logical Processor to Socket Map:
    ****  Socket 0
    
    Logical Processor to NUMA Node Map:
    ****  NUMA Node 0
    
    No NUMA nodes.
    
    Logical Processor to Cache Map:
    *---  Data Cache          0, Level 1,   16 KB, Assoc   4, LineSize  64
    *---  Instruction Cache   0, Level 1,   96 KB, Assoc   3, LineSize  64
    *---  Unified Cache       0, Level 2,    2 MB, Assoc  16, LineSize  64
    -*--  Data Cache          1, Level 1,   16 KB, Assoc   4, LineSize  64
    -*--  Instruction Cache   1, Level 1,   96 KB, Assoc   3, LineSize  64
    -*--  Unified Cache       1, Level 2,    2 MB, Assoc  16, LineSize  64
    --*-  Data Cache          2, Level 1,   16 KB, Assoc   4, LineSize  64
    --*-  Instruction Cache   2, Level 1,   96 KB, Assoc   3, LineSize  64
    --*-  Unified Cache       2, Level 2,    2 MB, Assoc  16, LineSize  64
    ---*  Data Cache          3, Level 1,   16 KB, Assoc   4, LineSize  64
    ---*  Instruction Cache   3, Level 1,   96 KB, Assoc   3, LineSize  64
    ---*  Unified Cache       3, Level 2,    2 MB, Assoc  16, LineSize  64
    
    Logical Processor to Group Map:
    ****  Group 0

    This is Kaveri on Win 10
     
  14. inf64

    inf64 Platinum Member

    Joined:
    Mar 11, 2011
    Messages:
    2,786
    Likes Received:
    1,056
    Any chance "Someone" might release 3rd party app that could enable this functionality? :)
    Or this has to be done at low level (firmware)?
     
    Drazick likes this.
  15. piesquared

    piesquared Golden Member

    Joined:
    Oct 16, 2006
    Messages:
    1,530
    Likes Received:
    362
    This thread should be pinned. By far the most in depth investigating of the chip.
     
  16. Jan Olšan

    Jan Olšan Senior member

    Joined:
    Jan 12, 2017
    Messages:
    231
    Likes Received:
    221
  17. Kromaatikse

    Kromaatikse Member

    Joined:
    Mar 4, 2017
    Messages:
    83
    Likes Received:
    169
    That's the same then, except that Win10 doesn't consider CMT to be equivalent to HyperThreading. Or maybe your BIOS reports it differently to mine (AsRock).

    ETA: Also, coreinfo reports a microcode signature on yours, but not on mine...
     
  18. Jan Olšan

    Jan Olšan Senior member

    Joined:
    Jan 12, 2017
    Messages:
    231
    Likes Received:
    221
    Is it possible that the W10 performance regressions could be related to some of those potential performance problem sources that AMD's reviewer guides mention?
    Specifically,
    1) use of balanced power profile in W10 (only "best performance" profile is supposed to be optimal)
    2) HPET enabled in BIOS (supposedly harms performance too, which I hope eventually gets fixed).

    Sorry if this was asked already. I also suspect you probably know these things already, sorry for doubting you.
     
  19. CatMerc

    CatMerc Golden Member

    Joined:
    Jul 16, 2016
    Messages:
    1,007
    Likes Received:
    943
    So, you believe that in 2 months time or so, 1800X will perform in games relative to the 6900K like it performs in, most synthetics?
     
  20. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,582
    Likes Received:
    2,545
    No.
    I've checked the performance with both "Balanced" & "High-Performance" profiles and with both HPET and TSC timing. The minor differences are mutual for both SMT On & Off conditions.
     
  21. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,582
    Likes Received:
    2,545
    There will be improvements, but it is impossible to say how large or tiny they might be.
    There are various, completely isolated regions where the improvements will occur.
     
    T1beriu and Drazick like this.
  22. otinane

    otinane Member

    Joined:
    Oct 13, 2016
    Messages:
    68
    Likes Received:
    13
    I might be out of waters here, but do Windows support irqbalance ?
     
  23. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,582
    Likes Received:
    2,545
    Personally I would like to have it implemented in the bios, so that the end-users would have no need to use 3rd party provided tools with ring 0 access in them.
    Time will tell ;)
     
    T1beriu, Drazick and inf64 like this.
  24. CatMerc

    CatMerc Golden Member

    Joined:
    Jul 16, 2016
    Messages:
    1,007
    Likes Received:
    943
    Thanks for your work Stilt, really interesting stuff.
     
  25. looncraz

    looncraz Senior member

    Joined:
    Sep 12, 2011
    Messages:
    715
    Likes Received:
    1,634
    Can you see if applying all Windows 10 updates makes any difference in performance?

    Maybe try the fast track as well?
     
    T1beriu and Drazick like this.