Ryzen: Strictly technical

unseenmorbidity · Mar 6, 2017

The Stilt said:
Verify in what respect?
Sure, you can hit those BCLKs if you set the PCIe to run in Gen. 1 mode.

At what point does it go into generation 1?

People are talking about the new Gskill memory that is run with 120 BCLK.

The Stilt · Mar 6, 2017

unseenmorbidity said:
At what point does it go into generation 1?

People are talking about the new Gskill memory that is run with 120 BCLK.

I'm not sure.
Gen. 3 can be sustained usually up to 107MHz thou.

Pookums · Mar 6, 2017

Lurker here.

I noticed in overclocking and ryzen master guide on AMD's official website that it lists both FCLK and UCLK as values for OC in Ryzen platform. I do not at present have a ryzen with mobo and I have not seen them listed for modification in any bios unless they are located under "advanced core options." Can these values be modified stilt? You mentioned some bios options that are currently hidden.

Now the FCLK(I believe this is infinity fabric on amd) seems plenty fast enough(even superior to intels ring bus). However, the UCLK memory controller speeds seem downright abhorrent(at half memory speeds) and I believe this might be the major cause for poor memory latency and odd game and draw call behavior.

I read back on intels nehalem architecture and its changes going into kabylake today. Originally the ideal was to have 2x the uncore to 1x Full memory speed(I.e if ddr 1600...3.2ghz uncore or higher was ideal.) This has moved toward parity as ddr speeds have substantially improved. However, there have been warnings on some throw away threads in message boards to always set uncore ABOVE the complete memory speed package otherwise it could substantially reduce performance. Some suggested this was most noticeable in linpack and games.

Even the 6900k has default 2.8 uncore. When using higher memory and OC its suggested to raise the 6900k up to 3.2 to a maximum of around 3.5(heat supposedly becomes an issue above this point).

So my question is, can the memory controller be modified in any way to achieve a much higher UCLK(as it only listed the existent on their webpage and didn't mention location for modification or even if in any ryzen bios)?

If not, Can anyone take a current GEN intel i7 (preferably a broadwell) and downclock their uncore to the abyssmal speeds of ryzens memory controller, and make a comparison on majincry's drawcall benchmark, on memory latency measurements and by playing games and give a comparison both before and after?

The Stilt · Mar 6, 2017

Ok, so I've now changed the charts to have a common Y-axises, where possible.
An additional data point was added to the charts to indicate performance with 256-bit workloads excluded (the ones which have actual gains from 256-bit code).
The excluded 256-bit workloads are: Blender, Bullet (IPC only), Embree, Euler3D, Himeno, Linpack, NBody & X265.

I also noticed that there is an error, affecting Kaby Lake results in 4C/4T, 4C/8T and SKU vs. SKU tests. The absolute data is correct, however the summary views are affected by a tiny amount.
This is because I made a typo in "Caselab Euler3D" calculation for Kaby Lake (reversed calculation). Because of that the summary views will show small changes for Kaby Lake only. In 4C/4T summary the difference will reduce by >= 0.5% and in 4C/8T increase by ~1% (same applies to SKU vs. SKU).

I leave the gallery codes (Imgur) to the original charts in the OP after changing the charts, so anyone can inspect the results if necessary.

agouraki · Mar 6, 2017

has anyone tried yet to tie up 8 threads with process lasso of a game that uses only up to that on a single CCX and compared results?

R0H1T · Mar 6, 2017

agouraki said:
has anyone tried yet to tie up 8 threads with process lasso of a game that uses only up to that on a single CCX and compared results?

You don't need process lasso for that, just process hacker or process explorer will do. Although process lasso isn't a bad choice but it also affects the game/application & could result in unrealistic gains wrt systems that don't have it installed.

The Stilt · Mar 6, 2017

The charts have been updated (y-axis).

Triskain · Mar 6, 2017

Hello Stilt, can you share some information about the current and load-line specification for the AM4 socket infrastructure? This information would be useful for evaluating motherboard VRM's.

KTE · Mar 6, 2017

Martin Schou said:
I'm sorry, but where's the rest of this graph? I picked this one, but I could have picked any of at least a dozen others that suffer from the same problem as this one does.

By cutting out 85% of this graph, you've made Haswell and Kaby Lake look like they have at least twice the performance of Zen. In other graphs it's a different story with Zen apparently being massively better than the others. Stop it. Stop misleading and deceiving people with these misleading graphics.

It's not even consistent. In some graphs you have included the entire thing. For example

But at a glance (and first impressions matter) the first graph shows a far more impressive performance lead than the second one does, even though the second one has an 52% margin!

Stop it. Stop being deceitful, stop being misleading.

It's not misleading when you have the graphs CLEARLY marked in percentage difference. Quit the over-sensitivity.

Edit: better word.

Sent from HTC 10
(Opinions are own)

The Stilt · Mar 6, 2017

Triskain said:
Hello Stilt, can you share some information about the current and load-line specification for the AM4 socket infrastructure? This information would be useful for evaluating motherboard VRM's.

Not for the final silicon unfortunately.
They changed drastically in the final silicon.

The Stilt · Mar 6, 2017

Pookums said:
Lurker here.

I noticed in overclocking and ryzen master guide on AMD's official website that it lists both FCLK and UCLK as values for OC in Ryzen platform. I do not at present have a ryzen with mobo and I have not seen them listed for modification in any bios unless they are located under "advanced core options." Can these values be modified stilt? You mentioned some bios options that are currently hidden.

Now the FCLK(I believe this is infinity fabric on amd) seems plenty fast enough(even superior to intels ring bus). However, the UCLK memory controller speeds seem downright abhorrent(at half memory speeds) and I believe this might be the major cause for poor memory latency and odd game and draw call behavior.

I read back on intels nehalem architecture and its changes going into kabylake today. Originally the ideal was to have 2x the uncore to 1x Full memory speed(I.e if ddr 1600...3.2ghz uncore or higher was ideal.) This has moved toward parity as ddr speeds have substantially improved. However, there have been warnings on some throw away threads in message boards to always set uncore ABOVE the complete memory speed package otherwise it could substantially reduce performance. Some suggested this was most noticeable in linpack and games.

Even the 6900k has default 2.8 uncore. When using higher memory and OC its suggested to raise the 6900k up to 3.2 to a maximum of around 3.5(heat supposedly becomes an issue above this point).

So my question is, can the memory controller be modified in any way to achieve a much higher UCLK(as it only listed the existent on their webpage and didn't mention location for modification or even if in any ryzen bios)?

If not, Can anyone take a current GEN intel i7 (preferably a broadwell) and downclock their uncore to the abyssmal speeds of ryzens memory controller, and make a comparison on majincry's drawcall benchmark, on memory latency measurements and by playing games and give a comparison both before and after?

UCLK, FCLK & DFICLK default to half of the effective MEMCLK frequency (i.e. DDR-2400 = 1200MHz).
There is a way to configure the memory controller (UCLK) for 1:1 rate, however that is strictly for debug and therefore completely untested. The end-user has neither the knowledge or the hardware to change it.
AFAIK FCLK & DFICLK are both fixed and cannot be tampered with. However certain related fabrics, which run at the same speed have their own frequency control. The "infinity fabric" (GMI) runs at 4x FCLK frequency.

KTE · Mar 6, 2017

The Stilt said:
Frankly I don't believe that the 14nm LPP will significatly improve any more. 14nm LPE, which is extremely closely related to LPP has been in mass production for 21 months now. Also the 14nm LPP itself has been in mass production for 15 months.
AMD most likely could tweak thing or two in the design in order to reach higher Fmax, however I fear that most of the tricks were already used in the final stepping of Zeppelin.

For the upcoming Zen iterations I would rather see AMD staying on a process, which has been fully tested in practice. Regardless if it means falling behind in the absolute node size. 16nm FF+ is extremely well proven and also 14nm HP (IBM) should be available at some point from GlobalFoundries. Moving immediately to a new 7nm node would be like taking a head dive into unknown waters, once more.

Did you miss my questions? Post #38776573.

I agree on LPP/LPE, but I bet AMD can still eek out 200MHz more, especially in XFR. I also suspect they will get that 1800 with a smaller boost to 65W. This is what LPs are made for.

The port to another process would take more than a year and the benefits? It's not that simple to move a new design from LP to HP.

The only one worth it is GFs HP which IBM will use.

But I am pretty sure Ryzen will blow the power budget with that, and I really don't think the design has it to hit much higher clocks. With that process, you're sacrificing power for clocks.

3.2GHz 95W -> 3.6GHz 125W -> 4.2GHz 140W -> 4.4GHz 200W.

(Estimated actual HVM power)

Sent from HTC 10
(Opinions are own)

Pookums · Mar 6, 2017

The Stilt said:
UCLK, FCLK & DFICLK default to half of the effective MEMCLK frequency (i.e. DDR-2400 = 1200MHz).
There is a way to configure the memory controller (UCLK) for 1:1 rate, however that is strictly for debug and therefore completely untested. The end-user has neither the knowledge or the hardware to change it.
AFAIK FCLK & DFICLK are both fixed and cannot be tampered with. However certain related fabrics, which run at the same speed have their own frequency control. The "infinity fabric" (GMI) runs at 4x FCLK frequency.

Thanks for the response. Is it possible they could be convinced to work with mobo manufacturers to update bios so UCLK could be changed for normal use outside of debug mode, possibly with a public release of technical information to know its frequency and/or voltage limits both 24/7 safe and maximum?

While I currently have no way of knowing for certain, I'm fairly confident low UCLK is the major reason for poorer than expected showing in memory latency performance and in certain benchmarks where it was unable to maintain parity with 6900k.

Other clock rates seem more than sufficient (infinity fabric itself seems extraordinary), which is why i'm primarily focused on this singular issue.

The Stilt · Mar 6, 2017

Pookums said:
Thanks for the response. Is it possible they could be convinced to work with mobo manufacturers to update bios so UCLK could be changed for normal use outside of debug mode, possibly with a public release of technical information to know its frequency and/or voltage limits both 24/7 safe and maximum?

While I currently have no way of knowing for certain, I'm fairly confident low UCLK is the major reason for poorer than expected showing in memory latency performance and in certain benchmarks where it was unable to maintain parity with 6900k.

Other clock rates seem more than sufficient (infinity fabric itself seems extraordinary), which is why i'm primarily focused on this singular issue.

I don't think it's even possible without external hardware (HDT), so pretty much no.

PPB · Mar 6, 2017

The Stilt said:
I'm not sure.
Gen. 3 can be sustained usually up to 107MHz thou.

This is without an external CLK generator I take? Now some people are fussing about what mobos have it and what don't, but IMO all mobos should BCLK OC, but only the ones with an external CLK generator would BCLK OC without the PCI-E and SoC SATA corruption problems

The Stilt · Mar 6, 2017

PPB said:
This is without an external CLK generator I take? Now some people are fussing about what mobos have it and what don't, but IMO all mobos should BCLK OC, but only the ones with an external CLK generator would BCLK OC without the PCI-E and SoC SATA corruption problems

PCIe is tied to the BCLK, regardless if external or internal Pll (FCH) is used. The linking occurs at silicon level so there is nothing you can do about it.
External Plls are more precise, but that's about it. Makes generally no difference for overclocking the BCLK.

Blitzvogel · Mar 6, 2017

Ryzen Tested as "Quad-Core" by Forced Disabling

moonbogg · Mar 6, 2017

Please excuse my ignorance, but I am just curious about all of this CCX talk limiting Ryzen's gaming performance. People seem to be placing their hopes in the upcoming windows update to fix Ryzen gaming performance. Is there any reality to this? I personally expect the performance of the chips to stay the same, but I don't know, so here I am asking the smart people.

Malogeek · Mar 6, 2017

Blitzvogel said:
Ryzen Tested as "Quad-Core" by Forced Disabling

With the single CCX and MHz apples to apples it's equal to the 7700k in gaming performance as well. That's very impressive for their first generation of a new architecture. Would love to see more tests like this to confirm results.

Shows obviously work needs to be done with the scheduler and CCX block architecture when in full 8C/16T which is being iterated by many other places as well but that's damn nice from AMD.

Malogeek · Mar 6, 2017

moonbogg said:
People seem to be placing their hopes in the upcoming windows update to fix Ryzen gaming performance. Is there any reality to this?

Probably somewhere in the middle. Improvements but likely not something that is completely removable. New CPU architectures have a history of such things. What's great is what we're starting with in my opinion.

The Stilt · Mar 6, 2017

I'm certain that the performance will improve, but not because of a single fix.
The issues we are seeing are caused by multiple different things.

PPB · Mar 6, 2017

moonbogg said:
Please excuse my ignorance, but I am just curious about all of this CCX talk limiting Ryzen's gaming performance. People seem to be placing their hopes in the upcoming windows update to fix Ryzen gaming performance. Is there any reality to this? I personally expect the performance of the chips to stay the same, but I don't know, so here I am asking the smart people.

Think about 2 proc systems.how does the os know that the l3 cache in the proc n1 is not shared at the silicon level with the l3 cache of the proc 2, so the cores of proc1 arent tasked with searching data in proc2 (which would be damn slow)? With NUMA nodes. This is how you diffetentiate 2 procesor's memory subsystems and avoid such conflicts. This can be really an easy task for the kernel/OS if done. Hopefully its implemented soon so we dont see more cach trashing and thread shifting between cores from different ccx's

Sent from my XT1040 using Tapatalk

Malogeek · Mar 6, 2017

Here's an interesting comparison to the 6900k and the same situation. Was just a quick search with little actual research

http://techbuyersguru.com/intels-core-i5-6600k-vs-i7-6700k-vs-i7-6900k-games?page=3

And ,indeed, we found the key to the 6900K's poor performance. By disabling four of its cores, we were able to make the 6900K perform just like a quad-core! Ironic that it requires this step to allow the eight-core CPU to keep pace with its $250 and $350 cousins.

raghu78 · Mar 6, 2017

wow that Ryzen tested as quad core at 4 ghz against a 7700k at 4 ghz is quite good. In fact the inter CCX bandwidth problem is automatically solved by having a single CCX with all 4/8T. I think if AMD can use a single CCX for R5 1500x with clocks of 3.9-4.0 Ghz for 1-2 cores and 3.7 Ghz for all core turbo then it would be a very good gaming chip too especially if AMD can price it at USD 179 (instead of USD 199). It will basically be very close to core i5 7600k but at a significantly lower price. In fact Intel's non k Skylake chips would have a tough time against it.

Harney · Mar 6, 2017

Blitzvogel said:
Ryzen Tested as "Quad-Core" by Forced Disabling

Interesting

Ryzen: Strictly technical

Golden Member

Golden Member

Member

Golden Member

Member

Platinum Member

Golden Member

Member

Senior member

Golden Member

Golden Member

Senior member

Member

Golden Member

Golden Member

Golden Member

Platinum Member

Lifer

Golden Member

Golden Member

Golden Member

Golden Member

Golden Member

Diamond Member

Junior Member