Ryzen: Strictly technical

Page 14 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Pookums

Member
Mar 6, 2017
32
13
36
Lurker here.

I noticed in overclocking and ryzen master guide on AMD's official website that it lists both FCLK and UCLK as values for OC in Ryzen platform. I do not at present have a ryzen with mobo and I have not seen them listed for modification in any bios unless they are located under "advanced core options." Can these values be modified stilt? You mentioned some bios options that are currently hidden.

Now the FCLK(I believe this is infinity fabric on amd) seems plenty fast enough(even superior to intels ring bus). However, the UCLK memory controller speeds seem downright abhorrent(at half memory speeds) and I believe this might be the major cause for poor memory latency and odd game and draw call behavior.

I read back on intels nehalem architecture and its changes going into kabylake today. Originally the ideal was to have 2x the uncore to 1x Full memory speed(I.e if ddr 1600...3.2ghz uncore or higher was ideal.) This has moved toward parity as ddr speeds have substantially improved. However, there have been warnings on some throw away threads in message boards to always set uncore ABOVE the complete memory speed package otherwise it could substantially reduce performance. Some suggested this was most noticeable in linpack and games.

Even the 6900k has default 2.8 uncore. When using higher memory and OC its suggested to raise the 6900k up to 3.2 to a maximum of around 3.5(heat supposedly becomes an issue above this point).

So my question is, can the memory controller be modified in any way to achieve a much higher UCLK(as it only listed the existent on their webpage and didn't mention location for modification or even if in any ryzen bios)?

If not, Can anyone take a current GEN intel i7 (preferably a broadwell) and downclock their uncore to the abyssmal speeds of ryzens memory controller, and make a comparison on majincry's drawcall benchmark, on memory latency measurements and by playing games and give a comparison both before and after?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Ok, so I've now changed the charts to have a common Y-axises, where possible.
An additional data point was added to the charts to indicate performance with 256-bit workloads excluded (the ones which have actual gains from 256-bit code).
The excluded 256-bit workloads are: Blender, Bullet (IPC only), Embree, Euler3D, Himeno, Linpack, NBody & X265.

I also noticed that there is an error, affecting Kaby Lake results in 4C/4T, 4C/8T and SKU vs. SKU tests. The absolute data is correct, however the summary views are affected by a tiny amount.
This is because I made a typo in "Caselab Euler3D" calculation for Kaby Lake (reversed calculation). Because of that the summary views will show small changes for Kaby Lake only. In 4C/4T summary the difference will reduce by >= 0.5% and in 4C/8T increase by ~1% (same applies to SKU vs. SKU).

I leave the gallery codes (Imgur) to the original charts in the OP after changing the charts, so anyone can inspect the results if necessary.
 

agouraki

Member
Feb 18, 2017
26
15
51
has anyone tried yet to tie up 8 threads with process lasso of a game that uses only up to that on a single CCX and compared results?
 
  • Like
Reactions: rvborgh

R0H1T

Platinum Member
Jan 12, 2013
2,582
162
106
has anyone tried yet to tie up 8 threads with process lasso of a game that uses only up to that on a single CCX and compared results?
You don't need process lasso for that, just process hacker or process explorer will do. Although process lasso isn't a bad choice but it also affects the game/application & could result in unrealistic gains wrt systems that don't have it installed.
 
  • Like
Reactions: agouraki

Triskain

Member
Sep 7, 2009
57
8
71
Hello Stilt, can you share some information about the current and load-line specification for the AM4 socket infrastructure? This information would be useful for evaluating motherboard VRM's.
 

KTE

Senior member
May 26, 2016
478
130
76
c6JTnh4.png


I'm sorry, but where's the rest of this graph? I picked this one, but I could have picked any of at least a dozen others that suffer from the same problem as this one does.

By cutting out 85% of this graph, you've made Haswell and Kaby Lake look like they have at least twice the performance of Zen. In other graphs it's a different story with Zen apparently being massively better than the others. Stop it. Stop misleading and deceiving people with these misleading graphics.

It's not even consistent. In some graphs you have included the entire thing. For example

j4ZVTx7.png


But at a glance (and first impressions matter) the first graph shows a far more impressive performance lead than the second one does, even though the second one has an 52% margin!

Stop it. Stop being deceitful, stop being misleading.
It's not misleading when you have the graphs CLEARLY marked in percentage difference. Quit the over-sensitivity.

Edit: better word.

Sent from HTC 10
(Opinions are own)
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Hello Stilt, can you share some information about the current and load-line specification for the AM4 socket infrastructure? This information would be useful for evaluating motherboard VRM's.

Not for the final silicon unfortunately.
They changed drastically in the final silicon.
 
  • Like
Reactions: Drazick

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Lurker here.

I noticed in overclocking and ryzen master guide on AMD's official website that it lists both FCLK and UCLK as values for OC in Ryzen platform. I do not at present have a ryzen with mobo and I have not seen them listed for modification in any bios unless they are located under "advanced core options." Can these values be modified stilt? You mentioned some bios options that are currently hidden.

Now the FCLK(I believe this is infinity fabric on amd) seems plenty fast enough(even superior to intels ring bus). However, the UCLK memory controller speeds seem downright abhorrent(at half memory speeds) and I believe this might be the major cause for poor memory latency and odd game and draw call behavior.

I read back on intels nehalem architecture and its changes going into kabylake today. Originally the ideal was to have 2x the uncore to 1x Full memory speed(I.e if ddr 1600...3.2ghz uncore or higher was ideal.) This has moved toward parity as ddr speeds have substantially improved. However, there have been warnings on some throw away threads in message boards to always set uncore ABOVE the complete memory speed package otherwise it could substantially reduce performance. Some suggested this was most noticeable in linpack and games.

Even the 6900k has default 2.8 uncore. When using higher memory and OC its suggested to raise the 6900k up to 3.2 to a maximum of around 3.5(heat supposedly becomes an issue above this point).

So my question is, can the memory controller be modified in any way to achieve a much higher UCLK(as it only listed the existent on their webpage and didn't mention location for modification or even if in any ryzen bios)?

If not, Can anyone take a current GEN intel i7 (preferably a broadwell) and downclock their uncore to the abyssmal speeds of ryzens memory controller, and make a comparison on majincry's drawcall benchmark, on memory latency measurements and by playing games and give a comparison both before and after?

UCLK, FCLK & DFICLK default to half of the effective MEMCLK frequency (i.e. DDR-2400 = 1200MHz).
There is a way to configure the memory controller (UCLK) for 1:1 rate, however that is strictly for debug and therefore completely untested. The end-user has neither the knowledge or the hardware to change it.
AFAIK FCLK & DFICLK are both fixed and cannot be tampered with. However certain related fabrics, which run at the same speed have their own frequency control. The "infinity fabric" (GMI) runs at 4x FCLK frequency.
 

KTE

Senior member
May 26, 2016
478
130
76
Frankly I don't believe that the 14nm LPP will significatly improve any more. 14nm LPE, which is extremely closely related to LPP has been in mass production for 21 months now. Also the 14nm LPP itself has been in mass production for 15 months.
AMD most likely could tweak thing or two in the design in order to reach higher Fmax, however I fear that most of the tricks were already used in the final stepping of Zeppelin.

For the upcoming Zen iterations I would rather see AMD staying on a process, which has been fully tested in practice. Regardless if it means falling behind in the absolute node size. 16nm FF+ is extremely well proven and also 14nm HP (IBM) should be available at some point from GlobalFoundries. Moving immediately to a new 7nm node would be like taking a head dive into unknown waters, once more.
Did you miss my questions? Post #38776573.

I agree on LPP/LPE, but I bet AMD can still eek out 200MHz more, especially in XFR. I also suspect they will get that 1800 with a smaller boost to 65W. This is what LPs are made for.

The port to another process would take more than a year and the benefits? It's not that simple to move a new design from LP to HP.

The only one worth it is GFs HP which IBM will use.

But I am pretty sure Ryzen will blow the power budget with that, and I really don't think the design has it to hit much higher clocks. With that process, you're sacrificing power for clocks.

3.2GHz 95W -> 3.6GHz 125W -> 4.2GHz 140W -> 4.4GHz 200W.

(Estimated actual HVM power)

Sent from HTC 10
(Opinions are own)
 

Pookums

Member
Mar 6, 2017
32
13
36
UCLK, FCLK & DFICLK default to half of the effective MEMCLK frequency (i.e. DDR-2400 = 1200MHz).
There is a way to configure the memory controller (UCLK) for 1:1 rate, however that is strictly for debug and therefore completely untested. The end-user has neither the knowledge or the hardware to change it.
AFAIK FCLK & DFICLK are both fixed and cannot be tampered with. However certain related fabrics, which run at the same speed have their own frequency control. The "infinity fabric" (GMI) runs at 4x FCLK frequency.

Thanks for the response. Is it possible they could be convinced to work with mobo manufacturers to update bios so UCLK could be changed for normal use outside of debug mode, possibly with a public release of technical information to know its frequency and/or voltage limits both 24/7 safe and maximum?

While I currently have no way of knowing for certain, I'm fairly confident low UCLK is the major reason for poorer than expected showing in memory latency performance and in certain benchmarks where it was unable to maintain parity with 6900k.

Other clock rates seem more than sufficient (infinity fabric itself seems extraordinary), which is why i'm primarily focused on this singular issue.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Thanks for the response. Is it possible they could be convinced to work with mobo manufacturers to update bios so UCLK could be changed for normal use outside of debug mode, possibly with a public release of technical information to know its frequency and/or voltage limits both 24/7 safe and maximum?

While I currently have no way of knowing for certain, I'm fairly confident low UCLK is the major reason for poorer than expected showing in memory latency performance and in certain benchmarks where it was unable to maintain parity with 6900k.

Other clock rates seem more than sufficient (infinity fabric itself seems extraordinary), which is why i'm primarily focused on this singular issue.

I don't think it's even possible without external hardware (HDT), so pretty much no.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
I'm not sure.
Gen. 3 can be sustained usually up to 107MHz thou.

This is without an external CLK generator I take? Now some people are fussing about what mobos have it and what don't, but IMO all mobos should BCLK OC, but only the ones with an external CLK generator would BCLK OC without the PCI-E and SoC SATA corruption problems
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
This is without an external CLK generator I take? Now some people are fussing about what mobos have it and what don't, but IMO all mobos should BCLK OC, but only the ones with an external CLK generator would BCLK OC without the PCI-E and SoC SATA corruption problems

PCIe is tied to the BCLK, regardless if external or internal Pll (FCH) is used. The linking occurs at silicon level so there is nothing you can do about it.
External Plls are more precise, but that's about it. Makes generally no difference for overclocking the BCLK.
 
  • Like
Reactions: Drazick

moonbogg

Lifer
Jan 8, 2011
10,635
3,095
136
Please excuse my ignorance, but I am just curious about all of this CCX talk limiting Ryzen's gaming performance. People seem to be placing their hopes in the upcoming windows update to fix Ryzen gaming performance. Is there any reality to this? I personally expect the performance of the chips to stay the same, but I don't know, so here I am asking the smart people.
 

Malogeek

Golden Member
Mar 5, 2017
1,390
778
136
yaktribe.org
With the single CCX and MHz apples to apples it's equal to the 7700k in gaming performance as well. That's very impressive for their first generation of a new architecture. Would love to see more tests like this to confirm results.

Shows obviously work needs to be done with the scheduler and CCX block architecture when in full 8C/16T which is being iterated by many other places as well but that's damn nice from AMD.
 

Malogeek

Golden Member
Mar 5, 2017
1,390
778
136
yaktribe.org
People seem to be placing their hopes in the upcoming windows update to fix Ryzen gaming performance. Is there any reality to this?
Probably somewhere in the middle. Improvements but likely not something that is completely removable. New CPU architectures have a history of such things. What's great is what we're starting with in my opinion.
 

PPB

Golden Member
Jul 5, 2013
1,118
168
106
Please excuse my ignorance, but I am just curious about all of this CCX talk limiting Ryzen's gaming performance. People seem to be placing their hopes in the upcoming windows update to fix Ryzen gaming performance. Is there any reality to this? I personally expect the performance of the chips to stay the same, but I don't know, so here I am asking the smart people.
Think about 2 proc systems.how does the os know that the l3 cache in the proc n1 is not shared at the silicon level with the l3 cache of the proc 2, so the cores of proc1 arent tasked with searching data in proc2 (which would be damn slow)? With NUMA nodes. This is how you diffetentiate 2 procesor's memory subsystems and avoid such conflicts. This can be really an easy task for the kernel/OS if done. Hopefully its implemented soon so we dont see more cach trashing and thread shifting between cores from different ccx's

Sent from my XT1040 using Tapatalk
 

Malogeek

Golden Member
Mar 5, 2017
1,390
778
136
yaktribe.org

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
wow that Ryzen tested as quad core at 4 ghz against a 7700k at 4 ghz is quite good. In fact the inter CCX bandwidth problem is automatically solved by having a single CCX with all 4/8T. I think if AMD can use a single CCX for R5 1500x with clocks of 3.9-4.0 Ghz for 1-2 cores and 3.7 Ghz for all core turbo then it would be a very good gaming chip too especially if AMD can price it at USD 179 (instead of USD 199). It will basically be very close to core i5 7600k but at a significantly lower price. In fact Intel's non k Skylake chips would have a tough time against it.
 
  • Like
Reactions: lightmanek
Status
Not open for further replies.