Ryzen: Strictly technical

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Turns out the base clock generator will allow for higher RAM overclocks right now.

The way AMD has their BIOS set up, you cannot yet change the memory timings. The high multipliers have very loose timings. The base clock generator allows for higher timings by isolating the Base Clock of the RAM from the CPU and then allowing you to use a lower multiplier by using a higher base clock. That in turn allows for higher memory overclocks, while keeping it at looser timings.

It might be an advantage considering DRAM is the last level cache.

Buildzoid has a pretty good video about it:






It may be best to tweak to the setting you like then disable the HPET.

This has nothing to do with "isolating the BCLK".
There are various bugs & issues in the current AGESA / PMU firmware version.
The higher memory ratios have not been working (even remotely) until very recently and they still have some pending issues.

In order to reach higher MEMCLKs it is currently better to use the lower memory ratios, which have been available since the beginning and therefore are fully functional and validated.
Using lower memory ratio of course means that the only way to reach high MEMCLKs is to increase the BCLK. The same exact thing can be achieved with the internal clock generator.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Here is what Igor Pavlov writes about that:

"About 7-Zip / LZMA speed for AMD Ryzen R7.

Decompression speed is OK at Ryzen.
Compression speed in fast mode with small dictionary probably is OK also.
Compression speed with big dictionary is not good. Compression with big dictionary uses big amount of memory and it needs low memory access latency.

And memory access latency is BAD for Ryzen R7.
Look the following review with memory tests:
http://www.hardware.fr/articles/956-22/retour-sous-systeme-memoire.html

Also maybe shared cache in Intel CPUs is better than two separated caches in Ryzen CPUs for multithreaded LZMA compressing. Probably AMD will ask Microsoft to improve thread scheduling to reduce thread walking from one CCX to another CCX. Maybe such fixed thread scheduling can help slightly in some cases, but I'm note sure that it will help for 7-Zip compression.

Probably special thread scheduler that can be embedded to 7-Zip program will help, but it's difficult to develop it. Some versions of Windows don't like when program changes thread affinity. So such feature requires big development tests with different versions of Windows and different types of CPUs. It can be difficult.

But any improvement for memory latency will help for compression speed. I suppose it's difficult for AMD to reduce memory latency in current Ryzens. I hope they will try to fix it in next Ryzen revisions, if they will have strong understanding how memory latency is important for some programs.

I didn't contact with AMD about Ryzen.
"


I am not sure that multi-threading is the main benefit here, rather than that RAR5 archives are overall faster to decompress. This even is true if you decompress them via 7-Zip, which only uses a single core for their decompression.

I did a quick comparison, compressing a bunch of PDF files into a solid archive of 4.7 gb (7z) and 4.61 gb (RAR5) size, that about 80% compression ratio. Best/Ultra compression setting, 64 mb booksize, 4 gb solid blocks. Decompression times:

7Z via 7-Zip: 1:50 min
7Z via Winrar: 1:54 min
RAR5 via 7-Zip: 0:32 min
RAR5 via Winrar: 0:22 min

Yes, multi-threaded decompression via Winrar results in 31% faster decompression, but even via single-threaded decompression the RAR5 format decompresses in only 29% of the time that the 7Z archive takes. Interestingly Winrar taxes a second core for decompressing the 7Z archive, but still takes longer.

All that being said, if a 4.7 gb archive takes less than 2 minutes to decompress than you really have to do these chores regularly to make the times matter.

What I take from this, is that 7-Zip decompression should maybe be run with manually settings its affinity to a single core (CCX).

The DRAM latency on Zeppelin is indeed extremely bad (similar to every IMC on AMD designs after Piledriver).
However if you compare the results I posted for Haswell 4C/8T (QC 2133) & Kaby Lake 4C/8T (DC 2400) you'll see that despite Haswell having higher latency, it is ~13.4% faster than Kaby Lake in 7-Zip compression.
 

unseenmorbidity

Golden Member
Nov 27, 2016
1,395
967
96
This has nothing to do with "isolating the BCLK".
There are various bugs & issues in the current AGESA / PMU firmware version.
The higher memory ratios have not been working (even remotely) until very recently and they still have some pending issues.

In order to reach higher MEMCLKs it is currently better to use the lower memory ratios, which have been available since the beginning and therefore are fully functional and validated.
Using lower memory ratio of course means that the only way to reach high MEMCLKs is to increase the BCLK. The same exact thing can be achieved with the internal clock generator.
If I increase bclk to say 120, then I will be forced to pcie 2.

If I xfire I end up with pcie 3 x 8 / pcie 3 x 8.

What would happen, if I did both, and would it heavily affect performance?

Would I have,

pcie 2 x 16 / pcie 2 x16

or

pcie 2 x 8 / pcie 2 x 8?
 

unseenmorbidity

Golden Member
Nov 27, 2016
1,395
967
96
That was what I figured would happen unfortunately. Would that impact performance a great deal or would that still be sufficient for dual 480's?
 

inf64

Diamond Member
Mar 11, 2011
3,685
3,957
136
  • Like
Reactions: Drazick

unseenmorbidity

Golden Member
Nov 27, 2016
1,395
967
96
Not quite sure you want to run high bclk OC 24/7.
Interesting, why not? My old xeon x5650 runs 185 blck. I don't run it all the time though. 40 hours a week at most probably.

I was reading this from stilt,

Ryzen's Data Fabric is locked 1:2 RAM speed, Faster RAM = Faster Ryzen.

I figured bclk would be a good way to increase performance. Hopefully the motherboards just allow faster ram though.

I think a biostar board and the g7 list 3600, but they seem mia.
 

CrazyElf

Member
May 28, 2013
88
21
81
This has nothing to do with "isolating the BCLK".
There are various bugs & issues in the current AGESA / PMU firmware version.
The higher memory ratios have not been working (even remotely) until very recently and they still have some pending issues.

In order to reach higher MEMCLKs it is currently better to use the lower memory ratios, which have been available since the beginning and therefore are fully functional and validated.
Using lower memory ratio of course means that the only way to reach high MEMCLKs is to increase the BCLK. The same exact thing can be achieved with the internal clock generator.


Can you elaborate on that? Are you saying that Buildzoid (the author of the video previously linked is wrong)?

Right now there are a few kits coming out. An example - G.Skill is releasing a 3466 MT/s kit soon. https://www.gskill.com/en/press/vie...s-and-fortis-series-ddr4-memory-for-amd-ryzen




Obviously there is no native setting to do that in Ryzen. There are fixed multipliers. So you are basically saying that they are shifting to PCIe 2.0 and that the Base Clock is doing nothing?

It's an interesting question - we are seeing huge gains right now with memory faster RAM, even at the cost of loose timings.


d680d0de_Ryzenmemscaling.png


56e0b8e0_ryzen-bench-oc-mem-arma3.png


It would be very interesting if we could get some benchmarks with 3466. There seem to be gains even with the looser timings of high speed RAM. I don't think we can do it right now, but it'd be awesome to see >4000 MT/s RAM. Even with loose timings, I wonder if we will see gains.

Combine that with disabling SMT and the "games performance gap" that AMD's Ryzen has with Intel is pretty much gone.
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Interesting, why not? My old xeon x5650 runs 185 blck. I don't run it all the time though. 40 hours a week at most probably.

I was reading this from stilt,

Ryzen's Data Fabric is locked 1:2 RAM speed, Faster RAM = Faster Ryzen.

I figured bclk would be a good way to increase performance. Hopefully the motherboards just allow faster ram though.

I think a biostar board and the g7 list 3600, but they seem mia.

Westmere has a separate PCI-E frequency, which is not tied to BCLK. That's not the case on Ryzen, where BCLK = PCI-E frequency.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Can you elaborate on that? Are you saying that Buildzoid (the author of the video previously linked is wrong)?

Right now there are a few kits coming out. An example - G.Skill is releasing a 3466 MT/s kit soon. https://www.gskill.com/en/press/vie...s-and-fortis-series-ddr4-memory-for-amd-ryzen


Obviously there is no native setting to do that in Ryzen. There are fixed multipliers. So you are basically saying that they are shifting to PCIe 2.0 and that the Base Clock is doing nothing?

It's an interesting question - we are seeing huge gains right now with memory OCs.


It would be very interesting if we could get some benchmarks with 3466. There seem to be gains even with the looser timings of high speed RAM. I don't think we can do it right now, but it'd be awesome to see >4000 MT/s RAM. Even with loose timings, I wonder if we will see gains.

Combine that with disabling SMT and the "games performance gap" that AMD's Ryzen has with Intel is pretty much gone.

I'm not quite sure what you're asking.
They are increasing the BCLK from 100MHz and using lower than the highest available DRAM ratio.
But regardless nothing has been "separated", the PCI-E still runs at BCLK speed (e.g. 118.16MHz) and because of that using Gen. 2 mode is mandatory.
 

unseenmorbidity

Golden Member
Nov 27, 2016
1,395
967
96
Westmere has a separate PCI-E frequency, which is not tied to BCLK. That's not the case on Ryzen, where BCLK = PCI-E frequency.
I figured this was compensated by the fact it dropped to pcie 2. Is that not the case.

Sorry, for all the questions.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,710
3,554
136
@The Stilt Sorry if this has been asked before and you've already answered it - is the limitation of BCLK ranges requiring PCI-e to run at older gen. speeds something that can be addressed through future BIOS updates?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I figured this was compensated by the fact it dropped to pcie 2. Is that not the case.

Sorry, for all the questions.

It is, but still there is no guarantees that everything is working completely properly and reliably.
Personally I'm not willing to risk the stability and data integrity for slightly higher memory clock, which provides diminishing improvements over the highest frequency available at 100MHz BCLK (3200MHz).
 
  • Like
Reactions: Drazick

CrazyElf

Member
May 28, 2013
88
21
81
Will contact G.Skill about this one.

Still, even with PCIe 2.0 it's not a huge limitation for GPUs and gaming.

The big problem is for people who want to run a PCIe SSD like the SSD 750 and get fast RAM at the same time, if what you are saying is true.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Will contact G.Skill about this one.

Still, even with PCIe 2.0 it's not a huge limitation for GPUs and gaming.

The big problem is for people who want to run a PCIe SSD like the SSD 750 and get fast RAM at the same time, if what you are saying is true.

Yeah, graphics cards are not the problem since they do just fine in Gen. 2 mode.
However fast M.2 drives can suffer, since they only use four lanes.
 

Timur Born

Senior member
Feb 14, 2016
277
139
116
What happens with all the other PCIe connected devices like Asmedia USB/SATA controllers, audio cards, video capture cards, Firewire cards and the like?

In the CH6 overclocking guide it is mentioned that you should use the CPU bound USB 3.0 ports when BCLK is increased. So what happens to the chipset bound USB and SATA ports?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
What happens with all the other PCIe connected devices like Asmedia USB/SATA controllers, audio cards, video capture cards, Firewire cards and the like?

In the CH6 overclocking guide it is mentioned that you should use the CPU bound USB 3.0 ports when BCLK is increased. So what happens to the chipset bound USB and SATA ports?

The PCI-E frequency is common for all devices.

No idea what other issues there might be, since I haven't extensively tested it (for obvious reasons).
Corrupting the M.2 drive at 107MHz (Gen. 3) was enough for me.
 

looncraz

Senior member
Sep 12, 2011
722
1,651
136

Timur Born

Senior member
Feb 14, 2016
277
139
116
I am not sure that multi-threading is the main benefit here, rather than that RAR5 archives are overall faster to decompress. This even is true if you decompress them via 7-Zip, which only uses a single core for their decompression.
I did more tests compressing 28 gb of uncompressed audio files to 17.5 gb (7Z) and 14.5 gb (RAR5) solid archives on my 4790K. All cores were set to a fixed 4 Ghz, no matter how many threads were used.

Both programs utilize about 75-80% of all my 8 logical for compression, no discernible differences of core usage between both programs there.

For decompression 7-Zip only uses a single core for both 7Z and RAR archives. WinRar (only) uses 2-3 logical cores for RAR archives and a single core for 7Z archives.

The difference between decompressing the RAR file on a single core compared to allowing all cores (only 2-3 used) was 3:20 min to 3:00 min.

The difference between decompressing the RAR file and decompressing the 7Z file was 3 min to over 15 min. So again the format makes a huge difference, throwing more cores at Winrar only makes a difference for the first 2 or 3 cores.

Changing archive types to non solid decreased WinRar's core usage to about 60% for compression, no difference for decompression. 7-Zip's core usage went down to only using 2 logical cores for compression, though.

PS: Since during these tests I found that RAR5 is a real step forward for Winrar I now bought it. So the tests were good for something. :p
 
Last edited:
  • Like
Reactions: lightmanek

guskline

Diamond Member
Apr 17, 2006
5,338
476
126
That was what I figured would happen unfortunately. Would that impact performance a great deal or would that still be sufficient for dual 480's?
unseenmorbidity, in the "for whatever it's worth department" you can see from the sig specs of my Ryzen 1800x rig below (running at stock speeds), I'm running 2 RX 480s in CF on a B350 chipset where I think the second is only running pci2 x 4 on at least the second slot. it's still fast!
 

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
The DRAM latency on Zeppelin is indeed extremely bad (similar to every IMC on AMD designs after Piledriver).

Hopefully AMD can fix this in "Pinnacle Ridge". If they can spare the resources, it might be a good idea for them to design their own IMC instead of using one licensed from Rambus.
 
Status
Not open for further replies.