Ryzen: Strictly technical

Discussion in 'CPUs and Overclocking' started by The Stilt, Mar 2, 2017.

  1. Mockingbird

    Mockingbird Senior member

    Joined:
    Feb 12, 2017
    Messages:
    271
    Likes Received:
    167
    #226 Mockingbird, Mar 5, 2017
    Last edited: Mar 5, 2017
    T1beriu, lightmanek, deadhand and 2 others like this.
  2. Avalon

    Avalon Diamond Member

    Joined:
    Jul 16, 2001
    Messages:
    7,410
    Likes Received:
    13
    Thanks for the hard work Stilt!
     
  3. roybotnik

    roybotnik Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    6
    Likes Received:
    9
    I ran the draw call benchmark on my Ryzen 1800x (OC'd to 4ghz) and averaged 17 fps. I've been messing with this system for hours through numerous reboots, and at some point I was getting 14, but I'm pretty sure I was at 4ghz then as well. The BIOS for the Asus Crosshair is really not in a good state. My voltages have been screwy and I've given up on getting my 3200 ram working past 2666 for now.

    Link: https://forums.anandtech.com/thread...call-performance.2499609/page-2#post-38776423
     
  4. R0H1T

    R0H1T Platinum Member

    Joined:
    Jan 12, 2013
    Messages:
    2,195
    Likes Received:
    65
    I see another Process Lasso fan, MS should really be ashamed of themselves seeing how they can't seem to do half of what bitsum achieved with their software. Then again it's windows (10) & it just works.
     
    Ajay likes this.
  5. starheap

    starheap Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    5
    Likes Received:
    0
    You are not the only one with... problems. Also have an 1800x, I cant get my memory past 2133 unless i do some convoluted process that involves swapping back and forth between two sets of ram...(then i can set the memory up with ryzen master to get 2666) I have a Gigabyte Gaming 5...

    I've had 3 sets in this system all with similar results(one 4 stick kit I returned to newegg because half of the dimms were defective due to bad packaging, confirmed with a skylake system). I think i'm going to try to order a cheap B350 motherboard and a cheap sub 3000mhz 16gig ram kit(something on the supported memory list). This way i'll be able to better narrow down the problem. Bought the cpu + board at microcenter so if its my cpu or board i'll be exchanging them. If i have to exchange my cpu probably gonna get a 1700 or 1700x...
     
  6. Kromaatikse

    Kromaatikse Member

    Joined:
    Mar 4, 2017
    Messages:
    83
    Likes Received:
    164
    Linux has a *very good* scheduler. It already avoids moving threads around needlessly as Windows does, and it's already aware of a wide variety of topologies (NUMA and otherwise). The kernel devs responsible are almost certainly fine-tuning it for Ryzen as we speak, but it is already reasonable.
     
    french toast and CatMerc like this.
  7. mvitkun

    mvitkun Junior Member

    Joined:
    Jan 22, 2013
    Messages:
    2
    Likes Received:
    0
  8. looncraz

    looncraz Senior member

    Joined:
    Sep 12, 2011
    Messages:
    711
    Likes Received:
    1,598
    You can actually set cpu groupsize using bcdedit and force NUMA on in Windows. But this will limit most individual processes to just one CCX, rather than just the ones which have a problem.

    The scheduler just needs to get smart enough to keep threads on the same CCX on which their data has likely remained.

    Games, meanwhile, need to manually set affinities for some of their threads on Ryzen for the best results. Locking a control loop thread to a single core will allow that core's "AI" to adapt and a few bonus points to gained.

    As for happens on 7nm... faster fabric, more CCXes, problem largely resolved. Maybe an L4.
     
    CatMerc, Drazick and rvborgh like this.
  9. imported_jjj

    imported_jjj Senior member

    Joined:
    Feb 14, 2009
    Messages:
    589
    Likes Received:
    398
    The Infinity fabric is at 512 Gbytes/s in Vega but unclear what latency.
     
  10. rvborgh

    rvborgh Member

    Joined:
    Apr 16, 2014
    Messages:
    95
    Likes Received:
    52
    interesting to read this...

    i had this same issue on my 48 core overclocked quad Opteron setup... i had a game that had a max of 6 threads... 1 heavy and the rest light. Performance was alright until i got to about 1800 bots... After much tuning... and trying different things... i found that using Process Lasso and tying the game to the fast cores on the processor in the first socket (4 out of the 12 K10s run fast on my system for each socket)... upped the frame rate by about 80-90 fps at the start of the game (from low to mid 200s to 335... and i could run the game with acceptable fps up to around 2500 bots.

    i really recommend folks trying Process Lasso on their Ryzen's... as it seems they have the same windows scheduling issue.
     
  11. french toast

    french toast Senior member

    Joined:
    Feb 22, 2017
    Messages:
    426
    Likes Received:
    332
    I gues the infiny fabric is not a fixed width, probably comes in links to scale up or down as needed.
     
  12. Evil Azrael

    Evil Azrael Junior Member

    Joined:
    Mar 4, 2017
    Messages:
    2
    Likes Received:
    0
    @The Stilt,
    so far you seem to be only one to get Win7 running, could you write a few words how you managed to do this. My own tries with my existing installation or with the win7 installation DVD (USB & SATA drive) had not much success, win7 always hangs during booting.

    Could you please tell how you managed to get win7 running?
     
  13. lolfail9001

    lolfail9001 Senior member

    Joined:
    Sep 9, 2016
    Messages:
    911
    Likes Received:
    305
    Anyways, i thought Phoronix data for Dota 2 Vulkan was pretty curious in regards to core scaling (SMT is enabled everywhere).
    http://www.phoronix.com/scan.php?page=article&item=amd-ryzen-cores&num=2

    Uhm, they made up fancy name for memory controller?


    Why would you do thread shuffling in age when the main issue most of the time is cache/memory access in the first place? Just curious.

    P. S. At this point i am confident that Himeno and Prime95 have the same issue: for some reason they use K10 codepath without any AVX. For Prime95 i am confident because i have seen a screenshot that seems to claim as much, for Himeno it is the only viable explanation left.
     
    lightmanek likes this.
  14. Hi-Fi Man

    Hi-Fi Man Senior member

    Joined:
    Oct 19, 2013
    Messages:
    457
    Likes Received:
    30
    This inter-CCX communication issue reminds me of the Core 2 Quads just without the MCM and FSB. I wonder how Windows dealt with that; Was XP ever patched to deal with that? Is the right solution to treat each CCX as a NUMA node?
     
  15. KTE

    KTE Senior member

    Joined:
    May 26, 2016
    Messages:
    477
    Likes Received:
    130
    @TheStilt .. Sorry if I've missed any of this. I can't stand Win 10, so don't like to use it except for a tablet.

    Does parking a Core with Ryzen occur in logical pairs or one at a time?

    What conditions have to be present in Win10 for the Core to be parked?

    Are loads juggled about in Win 10? And seriously, why would they be if Cores are supposed to be idle->parked for efficiency? That makes no sense.

    Core Park Manager and other tools can monitor and disable this feature, but so can the High Performance setting or changing min/max power values.

    What's the DRAM/CCX inter bandwidth/latency with a few parked Cores? i.e. does it change?

    Phenoms changed drastically.

    What did you attain for L1, L2, L3 access latency? (tried Franck@CPUIDs tool?)

    What is Ryzens idling frequency/voltage?

    What are idle/load stock CPU temps like? Do the sensors seem accurate? (compared to K10 gen)

    Can you monitor core throttling? Have you tried loading stock Ryzen to see if it throttles?

    Trying to explain performance issues...

    1. In every ideal arch, these areas are on separate power planes and clock domains... Completely separate voltage islands.

    This is a major shortsightedness by AMD with any 'IMC problems'. Phenom suffered due to IMC clocking and power shenanigans. That was 2006.

    Having a linked CCX power plane, is even still backwards.

    Its entirely possible that clocking might be impaired by their CCX more than purely 14nm LPP process. It was like that with Phenoms IMC/L3 previously.

    AMD implemented decoupling characteristics back in 2007/2008 silicon and saved a ton of power and performance.

    It is never a case of just decoupling alone tho and it just works. DRAM<->L3<->Fabric is very tricky to get right. Clocks and power generally pose major sync and timing issues to avoid corrupted data. Ryzens implementation is simply synonymous of a quick and easy job due to time constraints. It's obvious they couldn't afford to spend as much time as they needed on it.

    Going forward, it would be the first area they would look to change.

    2. These 'Windows issues' are AMD issues. We had them with Phenom for Christ's sake! When the workload bounced, with CnQ active, performance sucked and stuttered. AMD would have KNOWN about these issues during 'design considerations' 5 years back. It's AMD who has to adapt and get these fixed. Borked chip releases is only destroying their own image and income.

    Secondly, Core Parking has to be implemented by AMD. If you don't have a working driver for your test OS, why allow this?

    Telling reviewers to switch to the High Performance profile which doesn't park the core is just a band aid, at best and very misleading of your real world performance. Ryzen obviously has issues sleeping and waking the cores. Again, a Phenom issue.

    3. Having two CCX at low interconnect bandwidth/high latency is an even bigger flaw, but this again will not be by design. This is going to pose a huge problem on Server workloads unless fixed. Forget HPC altogether.

    4. Now you see why Server was not launched. AMD chose their smallest, least risk market to troubleshoot the chip.

    5. Intellectual prediction won't magically gain 10-15% performance -- this is wishful thinking, like the pre-release hype. Every one of which turned out wrong.


    I'm sure K10 had an app that could change the skew when different PStates are entered, and even force certain PStates.

    And it would show separate power planes portions.


    Sent from HTC 10
    (Opinions are own)
     
    #240 KTE, Mar 5, 2017
    Last edited: Mar 5, 2017
    T1beriu and Ajay like this.
  16. Greyguy1948

    Greyguy1948 Member

    Joined:
    Nov 29, 2008
    Messages:
    92
    Likes Received:
    12
    Regarding Excavator- can you disable CMT in BIOS?
    Regarding Himeno and Nbody- is it likely another size of these BM would give a very different result?
    3D Euler seem do be hard to predict- Caselab and CFD are very different.
    3D Euler tested at techreport.com is even worse for Ryzen.
     
  17. FlanK3r

    FlanK3r Senior member

    Joined:
    Sep 15, 2009
    Messages:
    270
    Likes Received:
    16
    Guys, whtas best temp monitoring with Ryzen? I wouldm like to use Coretemp...If it will be worked correctly. Or hwinfo sensoring? Thx.
     
  18. iBoMbY

    iBoMbY Member

    Joined:
    Nov 23, 2016
    Messages:
    99
    Likes Received:
    54
    I guess the HWinfo temperatures are correct at least. But my Crosshair just bricked, so any further testing has to wait. This could be a serious bug btw. I'm not the only one it seems. Edit: And this one seems to be similar as well.
     
    #243 iBoMbY, Mar 5, 2017
    Last edited: Mar 5, 2017
    lightmanek and FlanK3r like this.
  19. loccothan

    loccothan Senior member

    Joined:
    Mar 8, 2013
    Messages:
    270
    Likes Received:
    0
    Great Thread THX for the effort :)
     
  20. William Gaatjes

    Joined:
    May 11, 2008
    Messages:
    14,954
    Likes Received:
    79
    I guess it is better if i ask this question in this thread.

    I was wondering about something. I read that for some situations like gaming, it is better to disable SMT.
    But when SMT is disabled, the 8 threads on the cores can just as often be stalled as with SMT enabled but other threads take over execution time, filling in the gaps. Only, difference is that when SMT is disabled, when the thread stalls, the core stalls as well with SMT disabled. This would mean that Ryzen would dissipate less heat with SMT disabled because there are moments when the core does less work, allowing for higher sustained overclocks and boost clocks.

    Does that make any sense or am i overlooking something ?

    I mean, with 8 cores, that would be sufficient for a lot of people.
    Also, i have read that going for 2666MHz single sided (ranked, only memory chips on 1 side ) DDR4 modules is the best option to get high memory speeds.

    Is that true ?
     
  21. KompuKare

    KompuKare Senior member

    Joined:
    Jul 28, 2009
    Messages:
    544
    Likes Received:
    59
    I would imagine the same way you'd get it working on any new hardware:
    1. Either slipstream the drivers, or
    2. Have them ready on a USB stick and tell Windows about it at the right moment.
    The chipset drivers you should be able to get from most motherboard makers, for example:
    https://www.asus.com/uk/Motherboards/ROG-CROSSHAIR-VI-HERO/HelpDesk_Download/
    If you've never slipstreamed before, it's probably best if you follow a guide preferably starting with the SP1 ISO from Microsoft:
    https://www.microsoft.com/en-gb/software-download/windows7 (that now requires your product key which is hassle especially if you want to make a universal disc)
    This is a guide to slipstreaming the Intel RAID drivers:
    http://www.win-raid.com/t750f25-Guide-Integration-of-drivers-into-a-Win-image.html
    That uses NTLite which makes it easy but it is possible to do the same without third party tools.
     
  22. JDG1980

    JDG1980 Golden Member

    Joined:
    Jul 18, 2013
    Messages:
    1,349
    Likes Received:
    166
    Wow. That's incredibly impressive, especially when compared to results for existing laptops. The only ones that get a higher Cinebench ST score than that are two Clevo desktop replacements that use enthusiast K-series CPUs (and thus have high TDPs). For normal laptop CPUs, the best score recorded was 147.74, on the Dell XPS 15. That laptop uses a Core i7-6700HQ Skylake CPU with a TDP of 45W. And Ryzen can beat that by ~5% at two-thirds the TDP?!
     
  23. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    R7 1700 SHOULD have 3.05GHz ACXFRC (due being non-X), however there is indication that it's not necessarily the case.
    Unfortunately I don't have the data available, to check it right now.

    Based on my own tests, the average performance benefit from higher than 2400MHz DRAM (i.e. 1200MHz DFICLK) is very marginal in 2D, even with 8C/16T config. Will smaller core count even 2133MHz is fine.

    There are various interfaces / fabrics inside Zeppelin and their functionality and relations are not fully known at the moment. So eventhou the data fabric operates at half the effective MEMCLK, that doesn't necessarily mean that the inter-CCX connections are operating at the speculated width and speed. There are parts of the fabric which are 256-bit wide for example.

    Moving to another node (such as 16nm FF+) is really a nobrainer, if it yields in <10% increased Fmax. Porting the design will be extremely expensive, however not NEARLY as expensive as trying to increase IPC of the µarch itself. The results in terms of the performance are always guaranteed when increasing the frequency on existing design, while that's not the case on a modified µarch featuring higher IPC. The modified µarch may necessarily not be able to reach same speeds as the old one did, so the actual performance may remain the same or even degrade.
     
    T1beriu, .vodka, Ajay and 3 others like this.
  24. The Stilt

    The Stilt Golden Member

    Joined:
    Dec 5, 2015
    Messages:
    1,202
    Likes Received:
    1,363
    Unfortunately I don't have the equipment to do that.
     
    Drazick and Encrypted11 like this.
  25. DisEnchantment

    Joined:
    Mar 3, 2017
    Messages:
    93
    Likes Received:
    83
    So does it makes more sense cost and time wise for AMD to use Samsung 14nm LPU or totally port their design to TSMC 16 nm FF+
    TSMC's 16 FF+ seems to clock higher than samsung's 14nm LPP/LPC but isn't LPU supposed to catch up here?
    Also Glo Fo will skip 10nm right?