Ryzen: Strictly technical

Discussion in 'CPUs and Overclocking' started by The Stilt, Mar 2, 2017.

  1. HurleyBird

    HurleyBird Golden Member

    Joined:
    Apr 22, 2003
    Messages:
    1,409
    Likes Received:
    70
    Absolutely awesome, Stilt! I'm going to go as far as to say that you blew every review site out of the water with this one.

    Out of curiosity, was SMT/HT enabled or disabled for your IPC measurements?
     
  2. majord

    majord Senior member

    Joined:
    Jul 26, 2015
    Messages:
    220
    Likes Received:
    179
    Awesome work as expected :p

    I reallly hope this changes at some point.
     
    riggnix likes this.
  3. french toast

    french toast Senior member

    Joined:
    Feb 22, 2017
    Messages:
    426
    Likes Received:
    332
    Really nice data, thanks stilt.
     
  4. Jan Olšan

    Jan Olšan Member

    Joined:
    Jan 12, 2017
    Messages:
    50
    Likes Received:
    43
    That Cinebench at 30W looks great (not bad at 45W either). These chips should probably be great in the MCM Opterons, right? Probably much more suited for that than for 3.6-4.1 GHz desktop.
     
  5. jihe

    jihe Senior member

    Joined:
    Nov 6, 2009
    Messages:
    712
    Likes Received:
    51
    1000 cinebench at 35W is insane.
     
    lightmanek and richierich1212 like this.
  6. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    Technically there should be no difference to the "8 Thread" figure, despite the SMT was enabled. I'll double check anyway.
     
    Drazick likes this.
  7. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    All of the designs had their multithreading (SMT/CMT) enabled during the testing since that is what the end-users would be having.

    I've verified some of the individual results without the multithreading being enabled on all of them, and the differences fell within the standard deviation of the results (final results being 3RA).
    IIRC NBody illustrated a slight improvement on Excavator from having the CMT disabled, however that's pretty much irrelevant since such configuration is not allowed at default.
     
    Drazick likes this.
  8. LTC8K6

    LTC8K6 Lifer

    Joined:
    Mar 10, 2004
    Messages:
    24,854
    Likes Received:
    234
    How does it compare to Intel low power 8c/16t scores?
     
    pcp7 likes this.
  9. jihe

    jihe Senior member

    Joined:
    Nov 6, 2009
    Messages:
    712
    Likes Received:
    51
    E5-2620v4 gets about the same cinebench score and it's a 85w tdp part.
     
  10. R0H1T

    R0H1T Platinum Member

    Joined:
    Jan 12, 2013
    Messages:
    2,195
    Likes Received:
    65
    Extremely favorably, superb perf/w based on the evidence presented thus far ~
    [​IMG]
     
  11. bjt2

    bjt2 Senior member

    Joined:
    Sep 11, 2016
    Messages:
    784
    Likes Received:
    180
    You should compare with an 8 core. No wonder that an 8 core even at 35W outperforms a 35W 4c since this is an extremely parallelizable task... Is there some 8c low power (i am thinking of low power xeons for dense datacenters)?
     
  12. iBoMbY

    iBoMbY Member

    Joined:
    Nov 23, 2016
    Messages:
    100
    Likes Received:
    56
    About the CCX scheduling in Windows. Someone dares to try setting this:

    Code:
    bcdedit.exe /set groupsize 8
    
    According to this article, it could force Windows to split Ryzen into two NUMA groups of 8 logical cores, one per CCX. My Ryzen will not arrive before tomorrow, so I can't check if this still works, or if it changes anything.

    Edit: I removed the "bcdedit.exe /deletevalue groupaware" setting, this could have negative impact.

    Edit2: Per default the Windows Scheduler is forcing all threads of a process to one of the NUMA groups. This can of course have negative impact! If you start 8 threads in IntelBurnTest for example it only uses 4 logical cores, not 8 like it normally does. The groupaware setting forces everything to the second group, you definitely don't want that.

    Edit3: Okay, I wouldn't recommend my bcdedit processor group settings anymore, because it forces all threads of one process to one of the groups, so you would effectively always only use half the CPU with a single application. The setting is too strong.
     
    #87 iBoMbY, Mar 3, 2017
    Last edited: Mar 3, 2017
    looncraz, ZGR, inf64 and 1 other person like this.
  13. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    Noticed a funny thing with the L3 latency reported by AIDA.
    In Win 10 the latency is all over the place (19.7-56ns). In Win 7 the latency is 19.5 - 20.2ns, regardless of the settings used.
    Same setup, same version of AIDA (newest beta from 26th).
     
  14. imported_jjj

    imported_jjj Senior member

    Joined:
    Feb 14, 2009
    Messages:
    589
    Likes Received:
    398
    How well does Win 7 work? Forgot to check for info on that.
     
  15. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    Fine.
    Better (faster) than Win 10, IMO.
    The tricky part is installing it, unless you use optical media (due the lack of USB drivers in the media).
    Naturally you can include the USB drivers to the images using DISM for example.
     
    gupsterg, looncraz, Drazick and 2 others like this.
  16. inf64

    inf64 Platinum Member

    Joined:
    Mar 11, 2011
    Messages:
    2,731
    Likes Received:
    931
    Have you done tests on Windows 7 with Ryzen? If you have, are there any performance differences between the two?
     
    Drazick likes this.
  17. imported_jjj

    imported_jjj Senior member

    Joined:
    Feb 14, 2009
    Messages:
    589
    Likes Received:
    398
    That's great, thanks.
     
  18. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    I've used Win 7 about 95% of the time on Ryzen.
    Only the actual performance evaluation was done with Win 10, as it wasn't up to me.
    The performance differences are minor, but very constant regardless (in favor of Win 7).
     
  19. inf64

    inf64 Platinum Member

    Joined:
    Mar 11, 2011
    Messages:
    2,731
    Likes Received:
    931
    Great, thanks!
    Do you think you could try what ibomby is suggesting with bcdedit command? In theory it should split 16T Ryzen in 2 NUMA groups.
     
    Drazick likes this.
  20. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    I have to pass this one.
    I still need to validate several things and tampering with such options would mandate re-installing the OS and everything else, regardless if they're fully reversible or not :(
     
    CatMerc and inf64 like this.
  21. iBoMbY

    iBoMbY Member

    Joined:
    Nov 23, 2016
    Messages:
    100
    Likes Received:
    56
    In theory it should be safe, and could be reversed with "bcdedit /deletevalue [name]". The settings only apply after rebooting. But yes, in theory this could lead to the system not booting, so a backup (or at least a restore point) would be advisable.
     
    ZGR likes this.
  22. MajinCry

    MajinCry Golden Member

    Joined:
    Jul 28, 2015
    Messages:
    1,883
    Likes Received:
    297
    Would you be able to run that draw call benchmark in Win7, with a couple other tweaks?

    Apparently there's been motherboard BIOS updates that have improved performance (faster DDR4 speeds), as well as disabling SMT also giving better performance. Would be good to see if there's a cumulative effect that boosts the draw call perf.
     
  23. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    The most recent bioses don't have any low level changes compared to the one I was using.
    I'll try it under Win 7.
     
    Drazick and MajinCry like this.
  24. iBoMbY

    iBoMbY Member

    Joined:
    Nov 23, 2016
    Messages:
    100
    Likes Received:
    56
    Okay, I just tried my bcdedit settings on my old intel (with 4 instead of 8), and the results in Coreinfo are pretty interesting:

    Code:
    Logical to Physical Processor Map:
    Physical Processor 0 (Hyperthreaded):
    **--
    ----
    Physical Processor 1 (Hyperthreaded):
    --**
    ----
    Physical Processor 2 (Hyperthreaded):
    ----
    **--
    Physical Processor 3 (Hyperthreaded):
    ----
    --**
    
    Logical Processor to Socket Map:
    Socket 0:
    ****
    ----
    Socket 1:
    ----
    ****
    
    Logical Processor to NUMA Node Map:
    NUMA Node 0:
    ****
    ----
    NUMA Node 1:
    ----
    ****
    Calculating Cross-NUMA Node Access Cost...
                                           
    Approximate Cross-NUMA Node Access Cost (relative to fastest):
         00  01
    00: 1.5 1.1
    01: 1.0 1.0
    
    Logical Processor to Cache Map:
    Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
    **--
    ----
    Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
    **--
    ----
    Unified Cache       0, Level 2,  256 KB, Assoc   8, LineSize  64
    **--
    ----
    Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
    ****
    ----
    Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
    --**
    ----
    Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
    --**
    ----
    Unified Cache       2, Level 2,  256 KB, Assoc   8, LineSize  64
    --**
    ----
    Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
    ----
    **--
    Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
    ----
    **--
    Unified Cache       3, Level 2,  256 KB, Assoc   8, LineSize  64
    ----
    **--
    Unified Cache       4, Level 3,    8 MB, Assoc  16, LineSize  64
    ----
    ****
    Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
    ----
    --**
    Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
    ----
    --**
    Unified Cache       5, Level 2,  256 KB, Assoc   8, LineSize  64
    ----
    --**
    
    Logical Processor to Group Map:
    Group 0:
    ****
    ----
    Group 1:
    ----
    ****
    

    It also has an influence on the Cache mapping, for one, and the system looks pretty much like it has two physical processors now. You also get a new option in the task manager, to view the CPU load per NUMA node. So, it should have some influence on how Windows handles stuff ...

    Edit: The cache size is wrong now, but it is possible to set the L2 and L3 cache size via some registry settings, I'll try that now.

    Edit2: The cache setting in the registry doesn't change anything. The bcdedit settings can be removed with:

    Code:
    bcdedit.exe /deletevalue groupsize
    bcdedit.exe /deletevalue groupaware
    
    After rebooting everything looks like normal. Doesn't look like the group settings cause any instability (as long as you set it to dividable by 2). The Windows scheduling is definitely influenced.
     
    #99 iBoMbY, Mar 3, 2017
    Last edited: Mar 3, 2017
  25. bjt2

    bjt2 Senior member

    Joined:
    Sep 11, 2016
    Messages:
    784
    Likes Received:
    180
    restore point AFAIK does not restore BCD. A bootable USB or DVD is needed. if don't boot, use usb or dvd, repair menu, command prompt and bcdedit to restore...