Ryzen: Strictly technical

Discussion in 'CPUs and Overclocking' started by The Stilt, Mar 2, 2017.

  1. arandomguy

    arandomguy Senior member

    Joined:
    Sep 3, 2013
    Messages:
    216
    Likes Received:
    9
    I'm disappointed in what XFR turned out to be versus what I initially thought it could be if I'm understanding it correctly. It still basically behaves like existing CPU turbos in that there is essentially only 2 steps, ST and All core with no intermediates correct? As opposed to GPU turbos which actually have much more fine grain and dynamic clock speeds actually based on workload?
     
  2. bjt2

    bjt2 Senior member

    Joined:
    Sep 11, 2016
    Messages:
    784
    Likes Received:
    180
    The dLDO is present also in bristol ridge, according to an isscc paper. Moreover hiroshito goto posted, weeks ago some slides from an AMD presentation, were it was stated that dLDO was active. It can be easily proved if the dLDO is active in Ryzen: measure low load or idle consumption with and without OC mode enabled.
     
  3. tamz_msc

    tamz_msc Senior member

    Joined:
    Jan 5, 2017
    Messages:
    614
    Likes Received:
    466
    First of all big thanks for the detailed results.

    Question: Do you think the performance in NBody, Linpack etc. that use FMA can be hindered due to cache? You mentioned that the L3 frequency cannot be set independently within the CCX. Maybe because changing frequencies as data is moved around can affect coherency? NBody is a coupled system of differential equations, and Gauss-Jordan elimination used in Linpack does move a lot of data around.

    Follow up question: How to measure actual CPU core voltage instead of VID with software?
     
  4. PPB

    PPB Golden Member

    Joined:
    Jul 5, 2013
    Messages:
    1,090
    Likes Received:
    134
    Stilt, to make my motherboard choice easier, if I'm targetting a 3.5ghz allcore clock on a 1700 (the second critical point where beyond that voltage scales exponentially to clock ratio), what maximum sustained Amperage should i look after in vrm designs? Also, have you any hindsight in the vrm designs on b350 motherboards, considering the most powerful vrm designs tend to go in the highest end x370 boards? Thanks in advance.

    Sent from my XT1040 using Tapatalk
     
  5. iBoMbY

    iBoMbY Member

    Joined:
    Nov 23, 2016
    Messages:
    100
    Likes Received:
    56
    Thanks. That looks interesting, insofar as the Cache to Logical Core grouping seems to be wrong ... This is how it looks on my i7 3770k:

    Code:
    Logical Processor to Cache Map:
    **------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
    **------  Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
    **------  Unified Cache       0, Level 2,  256 KB, Assoc   8, LineSize  64
    ********  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
    --**----  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
    --**----  Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
    --**----  Unified Cache       2, Level 2,  256 KB, Assoc   8, LineSize  64
    ----**--  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
    ----**--  Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
    ----**--  Unified Cache       3, Level 2,  256 KB, Assoc   8, LineSize  64
    ------**  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
    ------**  Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
    ------**  Unified Cache       4, Level 2,  256 KB, Assoc   8, LineSize  64
    
    So code which takes this into account would assume each logical core has its own Cache, with a totally wrong Cache size. Also a NUMA group per CCX would make sense. This could explain some of the bad results.

    Edit: This is what I think Ryzen results should look like:

    Code:
    Logical Processor to Cache Map:
    **--------------  Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
    **--------------  Instruction Cache   0, Level 1,   64 KB, Assoc   4, LineSize  64
    **--------------  Unified Cache       0, Level 2,  512 KB, Assoc   8, LineSize  64
    ********--------  Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
    --**------------  Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
    --**------------  Instruction Cache   1, Level 1,   64 KB, Assoc   4, LineSize  64
    --**------------  Unified Cache       2, Level 2,  512 KB, Assoc   8, LineSize  64
    ----**----------  Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
    ----**----------  Instruction Cache   2, Level 1,   64 KB, Assoc   4, LineSize  64
    ----**----------  Unified Cache       3, Level 2,  512 KB, Assoc   8, LineSize  64
    ------**--------  Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
    ------**--------  Instruction Cache   3, Level 1,   64 KB, Assoc   4, LineSize  64
    ------**--------  Unified Cache       4, Level 2,  512 KB, Assoc   8, LineSize  64
    --------**------  Data Cache          5, Level 1,   32 KB, Assoc   8, LineSize  64
    --------**------  Instruction Cache   5, Level 1,   64 KB, Assoc   4, LineSize  64
    --------**------  Unified Cache       5, Level 2,  512 KB, Assoc   8, LineSize  64
    --------********  Unified Cache       6, Level 3,    8 MB, Assoc  16, LineSize  64
    ----------**----  Data Cache          6, Level 1,   32 KB, Assoc   8, LineSize  64
    ----------**----  Instruction Cache   6, Level 1,   64 KB, Assoc   4, LineSize  64
    ----------**----  Unified Cache       7, Level 2,  512 KB, Assoc   8, LineSize  64
    ------------**--  Data Cache          7, Level 1,   32 KB, Assoc   8, LineSize  64
    ------------**--  Instruction Cache   7, Level 1,   64 KB, Assoc   4, LineSize  64
    ------------**--  Unified Cache       8, Level 2,  512 KB, Assoc   8, LineSize  64
    --------------**  Data Cache          8, Level 1,   32 KB, Assoc   8, LineSize  64
    --------------**  Instruction Cache   8, Level 1,   64 KB, Assoc   4, LineSize  64
    --------------**  Unified Cache       9, Level 2,  512 KB, Assoc   8, LineSize  64
    
     
    #30 iBoMbY, Mar 2, 2017
    Last edited: Mar 2, 2017
    Agent-47 likes this.
  6. .vodka

    .vodka Senior member

    Joined:
    Dec 5, 2014
    Messages:
    804
    Likes Received:
    615
    Considering it's a fresh out of the oven platform, considering everything... it's quite a strong start, even with software used to Intel's architectures.

    Considering lots of BIOS fixes and updates to be issued this year, and as software gets optimized for Zen's particular quirks here and there, I can easily see it improving in performance with time.

    Big thanks for the results and all the information, Stilt, much appreciated.
     
    #31 .vodka, Mar 2, 2017
    Last edited: Mar 2, 2017
    Bacon1, Drazick and french toast like this.
  7. Greyguy1948

    Greyguy1948 Member

    Joined:
    Nov 29, 2008
    Messages:
    92
    Likes Received:
    12
  8. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    Drazick likes this.
  9. Greyguy1948

    Greyguy1948 Member

    Joined:
    Nov 29, 2008
    Messages:
    92
    Likes Received:
    12
    Very interesting info about SMT. Would this have any effect in games?
    3D Euler is also tested at Techreport.co and this is not very good for Ryzen. SMT should be off maybe?
     
  10. zir_blazer

    zir_blazer Senior member

    Joined:
    Jun 6, 2013
    Messages:
    853
    Likes Received:
    6
    Hi The Stilt, I just noticed you are still active, and, as usual, decided to ask you about some obscure features...

    Do you know if AVIC (AMD counterpart to Intel APICv for APIC Virtualization, present in the HEDT platform since Ivy Bridge-E, but omitted entirely from consumer LGA 1155/1150/1151) is or will be present in all Ryzen parts? So far, thanks to a lscpu dump from the guy at Phoronix, I noticed that at least on the Ryzen 7 1800X the avic CPU Flag is present: http://openbenchmarking.org/system/1703021-RI-AMDZEN08075/Ryzen 7 1800X/lscpu

    Other thing that I was interesed in is PCIe ACS (Access Control Services) support, both on Ryzen itself and the Chipset. It is a feature useful for PCI/VGA Passthrough on virtualized enviroments because it disables PCIe Peer-to-Peer data transfers and forces everything to go through the IOMMU, thus providing proper Device isolation (Else, its possible that due PCIe P2P they bypass it, which is not intended. Intel HEDT also has it, and again, its omitted on consumer Processors PCIe Controllers, although the consumer Chipsets do support it). Sadly, I have no idea how to specifically check support for ACS.
    The usefulness of that would be to have an idea of how good the default IOMMU Grouping in Ryzen AM4 platform should be, for potential AM4 Passthrough users. More info here, if you're interesed: http://vfio.blogspot.com.ar/2014/08/iommu-groups-inside-and-out.html

    If you can also get lspci -vvv and lspci -tv output from Linux in Ryzen, it would be even better, to know the platform in detail. There are also some minor features like FLR (Function Level Reset) on Chipset Devices which could also be useful to know if they're present or not. It would add a lot of flexibility if those features are supported.
     
    lightmanek likes this.
  11. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    CPU at fixed 3.6GHz speed.
    R9 Nano w/ the settings you specified in your thread.

    [​IMG]
     
    lightmanek and Drazick like this.
  12. lolfail9001

    lolfail9001 Senior member

    Joined:
    Sep 9, 2016
    Messages:
    915
    Likes Received:
    308
    Uhhhh
    [​IMG]
    That result is a disaster.
     
  13. MajinCry

    MajinCry Golden Member

    Joined:
    Jul 28, 2015
    Messages:
    1,883
    Likes Received:
    297

    Well, that's that. Ryzen, at least from that result, has not rectified AMD's draw call deficit. Damn. I was hoping that they would have finally done it.

    Edit: What driver version are you on?
     
    french toast likes this.
  14. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    As I said in the OP, dLDOs for the main blocks (cores, caches, data fabric) are NOT used in consumer Zeppelin parts. ZP-A0c was the last version which had them enabled. They are permanently disabled with a fuse config on retail consumer (i.e. ZP-B1 parts).
    The only reason Zeppelin features dLDOs in the first place is because it is a server design. It boosts the efficiency, makes binning easier and reduces BOM in MCM platforms. Without the integrated regulators you would either need four separate double plane VRMs (for Naples) and still you wouldn't be able to adjust the voltages for the different blocks individually.

    If there are any integrated regulators in Carrizo or Bristol Ridge, they definitely are not for the main planes either. You can tell this by comparing the driven SVI data and the voltage calibration (command) values.

    [​IMG]

    Here's the measured VDDCR_CPU power consumption in Normal & OC-Mode, with fixed voltage.
    The VRM output voltage was locked to 1.3250V, while the actual voltage request in normal mode is 1.26250V for 3.6GHz (P0 Pstate).
    If dLDOs are used in normal mode like you say, why isn't the power consumption significantly lower than in OC-Mode?

    Activating OC-Mode WILL INCREASE the power consumption, however it has nothing to do with the dLDOs. The actual reason can be found in the OP.
     
    gupsterg, KTE and Drazick like this.
  15. dogen1

    dogen1 Senior member

    Joined:
    Oct 14, 2014
    Messages:
    646
    Likes Received:
    29
    I wonder what the result would be with an nvidia card.
     
  16. MajinCry

    MajinCry Golden Member

    Joined:
    Jul 28, 2015
    Messages:
    1,883
    Likes Received:
    297
    Unfortunately, NVidia's driver is optimized for synthetic draw call benchmarks. If the same draw call is made for the entire scene (no lights, no shadows, no different objects), the driver essentially performs a very weak form of instancing.

    Only kept them in the benchmark because why not.
     
  17. dogen1

    dogen1 Senior member

    Joined:
    Oct 14, 2014
    Messages:
    646
    Likes Received:
    29
    I think it's more like the driver is able to take advantage of the situation. Their optimizations surely help in games where they're applicable.
     
  18. MajinCry

    MajinCry Golden Member

    Joined:
    Jul 28, 2015
    Messages:
    1,883
    Likes Received:
    297
    The necessary situation for this "optimization" to come into play never occurs in games. As soon as a different draw call gets called, the performance gain quickly dissipates. Any game that has a variety of objects, any lights, any shadows, or any materials, etc, will not be able to take advantage of this.

    It's purely an optimization for synthetic benches.
     
    Makaveli likes this.
  19. bjt2

    bjt2 Senior member

    Joined:
    Sep 11, 2016
    Messages:
    784
    Likes Received:
    180
    Maybe it's a problem in early BIOSes. Arstecnica article (https://arstechnica.com/gadgets/201...n-finally-an-architecture-that-can-compete/3/) based on the official slides, mentions dLDO. If it was disabled on retail parts, they will have not mentioned, do you agree? Are your considerations based on a late ES or retail samples? beta BIOS or final BIOS?
     
  20. dogen1

    dogen1 Senior member

    Joined:
    Oct 14, 2014
    Messages:
    646
    Likes Received:
    29
    Maybe not in the vast majority of games, but I'd prefer my driver to "auto-optimize" games that don't do this type of batching by themselves.

    Maybe it screws up benches, but I have to wonder why doesn't AMD have this type of functionality in their drivers.
     
  21. MajinCry

    MajinCry Golden Member

    Joined:
    Jul 28, 2015
    Messages:
    1,883
    Likes Received:
    297
    Because it's useless. If any other draw calls are made by the game, the optimization fails. I repeat, this will not have any use in games.

    http://enbseries.enbdev.com/forum/viewtopic.php?f=17&t=4869#p69741
    For this to be used, the game must not have any lighting, any shadows, any shaders, any materials, any decals, and any other meshes.

    You won't find this in games. Maybe if you go back to the 80s (i.e, Asteroids), but not even DOS era games would be applicable.
     
    Makaveli, looncraz and Dresdenboy like this.
  22. lolfail9001

    lolfail9001 Senior member

    Joined:
    Sep 9, 2016
    Messages:
    915
    Likes Received:
    308
  23. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    XFR is indeed basically a new name for the highest all core and single core boost states.
    The only difference is that the frequency changes occur faster than ever before, and because of that the granularity is not as large as one could expect.
     
    gupsterg, Drazick and arandomguy like this.
  24. thepaleobiker

    thepaleobiker Member

    Joined:
    Feb 22, 2017
    Messages:
    146
    Likes Received:
    45
    Potentially an optimizing issue given that older uArches (Intel's Core series and AMD's FX cores) are doing well , while the 1-day old Ryzen performs poorly..... Some softwares might have uArch specific optimizations.

    Hopefully, in the next few weeks/months , we will see an improvement or atleast an answer to these questions. I've not had a chance to read the REDDIT AMA thread by Lisa :(