Ryzen: Strictly technical

Discussion in 'CPUs and Overclocking' started by The Stilt, Mar 2, 2017.

  1. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,216
    Likes Received:
    1,415
    "Lazarus".
    ;)

    I think we'll know much more about the IOV by the time the server SKUs (Naples and Snowy Owl) hit the market. There is no way to launch the server platform without having the virtualization features fully functional and documented.
    Starting from Carrizo AMD has put surprisingly ample amount of resources into Linux.

    If I would get to decide what AMD would do with Zeppelin:

    - Iron out all of the existing shenanigans (obviously), where possible.
    - Rewrite the Turbo & XFR algorithms in the SMU: Turbo & XFR are maintained during "OC-Mode" operation, Turbo & XFR CPUFID/CPUDFSId/CPUVID are made user configurable
    - Start porting Zeppelin on 16nm FF+. Porting the design and all of the re-tooling it involves will be extremely costly, however it would be definitely worth it if it allows hitting even 300MHz higher Fmax on average, which it most likely would. Release as a refresh e.g. 1750, 1750X, 1850X.
     
    Dresdenboy, riggnix, T1beriu and 5 others like this.
  2. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,216
    Likes Received:
    1,415
    All of the updates have been applied, however I need to try those fast tracks.
     
  3. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,216
    Likes Received:
    1,415
    I might have some tools for that, but I'm not certain if they are compatible with Zeppelin. Need to check it out.
     
    T1beriu, lightmanek and Drazick like this.
  4. JimmiG

    JimmiG Platinum Member

    Joined:
    Feb 24, 2005
    Messages:
    2,010
    Likes Received:
    100
    How likely are you to hit the "few cores" boost frequency (e.g. 4 GHz for 1800X)?

    I'm seeing 3.7 GHz pretty much constantly with my 1800X, even when running Furmark or Prime95 with one thread. Shouldn't it be hitting 4 GHz under such a workload? Maybe it's the Windows thread shuffling that causes more than 2 cores to be partially active at all times, blocking the boost?

    I don't think there are any problems with my cooling or power supply since I'm always constantly at the "all cores boost" of 3.7 GHz rather than the base 3.6 GHz. My CPU-Z score seems to mirror those in the reviews, both for single-threaded and 16 threads (2140 / 19203 respectively), which leads me to believe Boost behaves like this for everyone.
     
  5. Mockingbird

    Mockingbird Senior member

    Joined:
    Feb 12, 2017
    Messages:
    314
    Likes Received:
    188
    duplicated post
     
    #205 Mockingbird, Mar 4, 2017
    Last edited: Mar 4, 2017
  6. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,216
    Likes Received:
    1,415
    I suggest you check the frequencies using the newest HWInfo rather than with CPU-Z.
    Also Prime95 is currently not working properly on Ryzen, so I suggest you try with another workload.

    But yeah, generally during a true single core workload you should be able to sustain 4.0 - 4.1GHz on 1800X.
     
    Drazick likes this.
  7. Mockingbird

    Mockingbird Senior member

    Joined:
    Feb 12, 2017
    Messages:
    314
    Likes Received:
    188
    You should do a clean install because Coreinfo shows that something is not configured correctly.

    Other users report theirs look like this:

     
    #207 Mockingbird, Mar 4, 2017
    Last edited: Mar 4, 2017
  8. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,216
    Likes Received:
    1,415
    Where would this "own place" be, in your opinion? :)
     
    Drazick and ZGR like this.
  9. cytg111

    cytg111 Diamond Member

    Joined:
    Mar 17, 2008
    Messages:
    3,363
    Likes Received:
    271
    I dont know, anandtechs, ars, your own site perhaps? I understand that politics would follow and I do enjoy the objectivity of this review, so theres that...
    I am just saying I think its good and I think you could get paid.. thats all.
     
  10. imported_jjj

    imported_jjj Senior member

    Joined:
    Feb 14, 2009
    Messages:
    589
    Likes Received:
    399
    Computerbase got a 17% boost in avg FPS in Total War: Warhammer when disabling SMT.
    At 1080p with not sure what GPU as i don't speak german but Titan X or GTX 1080.
     
  11. JimmiG

    JimmiG Platinum Member

    Joined:
    Feb 24, 2005
    Messages:
    2,010
    Likes Received:
    100
    I guess something is wrong then after all :(
    I'm seeing 4.1 GHz "Maximum" in HWInfo, but it's only sustaining 3.7 GHz.

    Annoyingly it's impossible to even get trustworthy temperature readings because the BIOS from 28 February shows 56C idle, 74C full load, the one from 2/24 shows 34C idle and 58C full load. Don't want to rip out the heatsink and buy a new one only to find it was a BIOS issue all along...
     
  12. bongey

    bongey Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    2
    Likes Received:
    2
    Does it support 64-bit PCI addressing/Above 4G decoding?
    Trying to figure out if Xeon Phi will work in it.
     
  13. formulav8

    formulav8 Diamond Member

    Joined:
    Sep 18, 2000
    Messages:
    6,421
    Likes Received:
    194
    I could give you your own place on my local computer side-job site I have. (not listed :)

    Edit: This is a high profile site
     
    #213 formulav8, Mar 4, 2017
    Last edited: Mar 4, 2017
    cytg111 likes this.
  14. gupsterg

    gupsterg Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    6
    Likes Received:
    3
    @The Stilt

    Thank you for the article :).

    Any chance you can fill in the x.xGHz ACXFC for a R7 1700?

    For example, for the 1700 SKU the clock configuration is following: 3.0GHz all core frequency (MACF), 3.7GHz single core frequency (MSCF), x.xGHz maximum all core XFR ceiling (ACXFRC) and 3.75GHz maximum single core XFR ceiling (SCXFRC).
     
  15. starheap

    starheap Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    5
    Likes Received:
    0
    For those curious mine was on a totally fresh install done from ryzen, and i used a freshly downloaded iso image from microsoft.
     
  16. Mockingbird

    Mockingbird Senior member

    Joined:
    Feb 12, 2017
    Messages:
    314
    Likes Received:
    188
    Two questions for @The Stilt

    1. Since the frequency of the data fabric is fixed to the memory frequency at a ratio of 1:2, does this mean that using faster memory would result in much faster performance and in what tasks would having faster fabric frequency be most beneficial?

    It seems that fixing the data fabric frequency to the memory frequency impose significant restriction to the data fabric. In the future, could the data fabric frequency be decoupled from the frequency of the memory controller or perhaps the ratio could be changed for higher data fabric frequency?

    2. Since low clock speed is the result of Samsung 14nm LPP and that increasing frequency beyond Critical 2 would require significantly higher voltage, would it be beneficial for AMD to instead move its production of high frequency products to TSMC?

    Or rather, should AMD be focused on increasing its IPC at the given frequency?
     
  17. Encrypted11

    Encrypted11 Junior Member

    Joined:
    Jul 26, 2013
    Messages:
    2
    Likes Received:
    0
    There seem to be no public record of IO testing on the internet with rushed reviews.

    Will you test the IO?
     
  18. starheap

    starheap Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    5
    Likes Received:
    0
    So i have a theory regarding this HLSL_Instancing test you people were doing. It is an x86 not x64 program. For example if you run the CPU-Z bench with the x86 version the performance is less than half of the x64 version. So could it just be that ryzen/amd just has bad x86 performance?

    The source code is included with it so technically one could attempt to recompile it for x64 to try and confirm this.
     
  19. Kromaatikse

    Kromaatikse Member

    Joined:
    Mar 4, 2017
    Messages:
    83
    Likes Received:
    167
    I doubt it. The same general deficit in graphics performance also shows up under Linux, where 64-bit code is more ubiquitous, and most of the games showing trouble are also 64-bit.

    IMHO all signs point to BIOS or driver problems, not a flaw in the CPU itself.
     
  20. Ajay

    Ajay Diamond Member

    Joined:
    Jan 8, 2001
    Messages:
    3,233
    Likes Received:
    268
    Hmm, all because of Windows odd behavior vis-a-vis thread allocation. Windows NT, at least back to 3.5, switches threads between cores (then CPUs) according to some algorithm (always looked random to me). I remember a spat on COMP.ARCH between some server dude and Dave Cutler over this on a dual CPU system - task manager showed exactly 50% utilization on each CPU when running a single threaded process. AMD, apparently, couldn't afford too design and implement two separate CPUs for client and server (with both being monolithic)

    re: 1) AMD had to know this, just based on the design. No easy way to fix it without sort of breaking windows. I think Linux is smarter about threads and data locality - probably why it performs so well with Ryzen (haven't looked at Linux scheduling in a long time, so AFAIK). Even within one CCX this will be an issue, since prefetches are to the private caches.

    re: 2) As Stilt pointed out, a ring 0 proggie can change this via core parking or from the command prompt with the /affinity switch can be used. I've been using Process Lasso for years to deal with this issue in some, high performance, programs (and to manipulate priority levels). It sounds like MS may be waiting for the April launch of Redstone 2 ("creators update") to fix this within the windows scheduler when Zeppelin CPU is detected. I would guess that the fix may already be in the latest Windows "Fast Ring" updates or will be in the next couple of weeks.

    I do wonder if AMD is planning on creating a monolithic design for 7nm? I don't know if their fabric can support > 12 cores with a single shared L3$ or if they can even afford to do that (since it must already be in development). If AMD is successful and is able afford to spend more on CPU development - I think we'll see something even more impressive than Zen (well, in absolute terms).
     
  21. CrazyElf

    CrazyElf Member

    Joined:
    May 28, 2013
    Messages:
    83
    Likes Received:
    16
    I guess at this point we'll just have to wait for more applications to be optimized for this unique topology.

    The lesson is the minimize the communication between the 2 CCX as much as possible. We may have gains that use fewer than 4 cores just use 1 CCX, while the rest of the operating system uses the rest. With the SMT off and some decent RAM overclcoks, we should actually get some decent gaming performance, perhaps even approaching the workstation benchmarks.


    Any idea what the speed is of the inter-CCX interconnects?

    From PCGH.de: http://www.pcgameshardware.de/Ryzen-7-1800X-CPU-265804/Tests/Test-Review-1222033/

    PCGH.de like Hardware.fr says the same speed. 22 GB/s seems to be the speed, which seems slow. For a comparison, the QPI links for Haswell EP are 9.6 GT/s, which works out to 38.4 GB/s.

    There's got to be a latency penalty for using RAM as last level cache here.



    Yeah I really think that this CPU needed an L4 cache of some sort and a faster interconnect between the CCX.

    Once the BIOS fixes are in, we'll need to see if the memory overclocking is more sensitive to frequency or tight timings.
     
    #221 CrazyElf, Mar 4, 2017
    Last edited: Mar 4, 2017
  22. piesquared

    piesquared Golden Member

    Joined:
    Oct 16, 2006
    Messages:
    1,382
    Likes Received:
    198
    It will be interesting to hear developers take on this CPU.
     
  23. crashtech

    crashtech Diamond Member

    Joined:
    Jan 4, 2013
    Messages:
    6,696
    Likes Received:
    254
    Certainly Linux is not perfect, it has a scheduler, too. Time will tell if some patches can be applied to inform the scheduler of Ryzen's new architecture.
     
  24. deadhand

    deadhand Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    18
    Likes Received:
    62
    I've spent some time working with two people ('Longcat' and 'iWalkingCorpse' from AMD discord) with Ryzen systems to test Valve's CS: Go map compiling tools on Ryzen. I believe the results support the notion that the inter-CCX fabric can cause potentially significant slow-downs when the scheduler is allowed to freely shuffles threads between CCX's.

    A while ago I discovered that if I use the map compiling tools across my dual E5-2680's (8 core / 16 thread SandyBridge-EP processors w/ 2.7 base, 3.5 boost clocks), I get negative scaling. By using an application called 'Process Lasso', I forced all 16 threads from 'vrad' (radiosity computations for lightmaps) and 'vvis' (generates visibility sets) to a single CPU. Performance went up dramatically. Given Ryzen's floor plan, I thought perhaps there would be a similar issue. Sure enough, there is, and here are the results:

    Note: In all cases, 'vrad' (the longest running program, by far) was set to use 8 threads. This was so it aligns nicely with a single CCX.

    Additionally, the results from the E5-2680 are not really intended to compare directly against the Ryzen CPUs, except to note that the benefit of having all threads mapped to the first 8 hardware threads is significantly greater on Ryzen than on my Intel CPU. Further, the dual-processor test was to show how these tools behave in a NUMA environment.

    EDIT: I should also note that the e5-2680 can do an all-core boost of around ~3 ghz. I'm also using Registered ECC memory @ 1333 mhz per stick in quad channel (for each CPU), with a total of 16 modules.

    (Lower is better)

    [​IMG]

    The green bars are when the thread affinity is set to a single CCX, or in the case of my E5-2680's, the first 8 hardware threads. While the scheduler still shuffles the threads around within it, it appears much faster.

    I should also note that there is more variance the worse the results are - I've taken the best of each set of results, but other runs on the 2 CPU / physical cores only test had a 135 point variance (908 seconds in the worst result), whereas the better performing tests, the green results - had significantly less variance between runs (~10 seconds between best and worst).

    Additionally, it's interesting to note that the R7-1700 is getting better results than the R7-1800x system on the cross-CCX tests (though this may be margin of error). The RAM used by the owner of the OC'd r7-1700 is DDR4-3000 mhz, while owner of the R7-1800x is using DDR4-2400 memory.

    Lastly, please take these results with a grain of salt. Testing was difficult due to multiple people testing on different machines. I wouldn't post these results if there wasn't such an obvious spread in performance between the different affinity configurations. If anyone is interested in exact methodologies, I can post below or edit this post.

    Please also note that I chose these compile tools for testing as they are an excellent example of the kind of memory bottlenecks that can occur in multi-threading. It also does not scale well beyond 8 threads on 4 physical CPUs (though all tests i ran involved using the '-threads 8' flag to only allow vrad to use 8 main worker threads)
     
    #224 deadhand, Mar 4, 2017
    Last edited: Mar 5, 2017
    Vadim.k, T1beriu, isp and 17 others like this.
  25. DisEnchantment

    Joined:
    Mar 3, 2017
    Messages:
    100
    Likes Received:
    91
    So probably this could explain why disabling SMT improves performance in non threaded application like games.

    With threaded workloads the shuffling across CCXs is not there whereas in lightly threaded workloads what you mentioned above is a bottle neck. So it could explain why the massive MultiThread performance of Ryzen does not translate into game performance. It is actually a performance penalty.

    Question though, would disabling a complete CCX actually help reduce the penalty, could someone with the Ryzen chips please try running the gaming bench with one of the CCX disabled and SMT disabled?
     
    #225 DisEnchantment, Mar 4, 2017
    Last edited: Mar 4, 2017