• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Ryzen: Strictly technical

Page 4 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Jan Olšan

Senior member
Jan 12, 2017
266
258
106
That Cinebench at 30W looks great (not bad at 45W either). These chips should probably be great in the MCM Opterons, right? Probably much more suited for that than for 3.6-4.1 GHz desktop.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Absolutely awesome, Stilt! I'm going to go as far as to say that you blew every review site out of the water with this one.

Out of curiosity, was SMT/HT enabled or disabled for your IPC measurements?
All of the designs had their multithreading (SMT/CMT) enabled during the testing since that is what the end-users would be having.

I've verified some of the individual results without the multithreading being enabled on all of them, and the differences fell within the standard deviation of the results (final results being 3RA).
IIRC NBody illustrated a slight improvement on Excavator from having the CMT disabled, however that's pretty much irrelevant since such configuration is not allowed at default.
 
  • Like
Reactions: Drazick

bjt2

Senior member
Sep 11, 2016
784
180
86
You should compare with an 8 core. No wonder that an 8 core even at 35W outperforms a 35W 4c since this is an extremely parallelizable task... Is there some 8c low power (i am thinking of low power xeons for dense datacenters)?
 

iBoMbY

Member
Nov 23, 2016
175
103
86
About the CCX scheduling in Windows. Someone dares to try setting this:

Code:
bcdedit.exe /set groupsize 8
According to this article, it could force Windows to split Ryzen into two NUMA groups of 8 logical cores, one per CCX. My Ryzen will not arrive before tomorrow, so I can't check if this still works, or if it changes anything.

Edit: I removed the "bcdedit.exe /deletevalue groupaware" setting, this could have negative impact.

Edit2: Per default the Windows Scheduler is forcing all threads of a process to one of the NUMA groups. This can of course have negative impact! If you start 8 threads in IntelBurnTest for example it only uses 4 logical cores, not 8 like it normally does. The groupaware setting forces everything to the second group, you definitely don't want that.

Edit3: Okay, I wouldn't recommend my bcdedit processor group settings anymore, because it forces all threads of one process to one of the groups, so you would effectively always only use half the CPU with a single application. The setting is too strong.
 
Last edited:

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Noticed a funny thing with the L3 latency reported by AIDA.
In Win 10 the latency is all over the place (19.7-56ns). In Win 7 the latency is 19.5 - 20.2ns, regardless of the settings used.
Same setup, same version of AIDA (newest beta from 26th).
 

imported_jjj

Senior member
Feb 14, 2009
660
430
136
Noticed a funny thing with the L3 latency reported by AIDA.
In Win 10 the latency is all over the place (19.7-56ns). In Win 7 the latency is 19.5 - 20.2ns, regardless of the settings used.
Same setup, same version of AIDA (newest beta from 26th).
How well does Win 7 work? Forgot to check for info on that.
 

inf64

Diamond Member
Mar 11, 2011
3,163
2,211
136
Fine.
Better (faster) than Win 10, IMO.
The tricky part is installing it, unless you use optical media (due the lack of USB drivers in the media).
Naturally you can include the USB drivers to the images using DISM for example.
Have you done tests on Windows 7 with Ryzen? If you have, are there any performance differences between the two?
 
  • Like
Reactions: Drazick

imported_jjj

Senior member
Feb 14, 2009
660
430
136
Fine.
Better (faster) than Win 10, IMO.
The tricky part is installing it, unless you use optical media (due the lack of USB drivers in the media).
Naturally you can include the USB drivers to the images using DISM for example.
That's great, thanks.
 

inf64

Diamond Member
Mar 11, 2011
3,163
2,211
136
I've used Win 7 about 95% of the time on Ryzen.
Only the actual performance evaluation was done with Win 10, as it wasn't up to me.
The performance differences are minor, but very constant regardless (in favor of Win 7).
Great, thanks!
Do you think you could try what ibomby is suggesting with bcdedit command? In theory it should split 16T Ryzen in 2 NUMA groups.
 
  • Like
Reactions: Drazick

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Great, thanks!
Do you think you could try what ibomby is suggesting with bcdedit command? In theory it should split 16T Ryzen in 2 NUMA groups.
I have to pass this one.
I still need to validate several things and tampering with such options would mandate re-installing the OS and everything else, regardless if they're fully reversible or not :(
 
  • Like
Reactions: CatMerc and inf64

iBoMbY

Member
Nov 23, 2016
175
103
86
In theory it should be safe, and could be reversed with "bcdedit /deletevalue [name]". The settings only apply after rebooting. But yes, in theory this could lead to the system not booting, so a backup (or at least a restore point) would be advisable.
 
  • Like
Reactions: ZGR

MajinCry

Platinum Member
Jul 28, 2015
2,488
560
136
Noticed a funny thing with the L3 latency reported by AIDA.
In Win 10 the latency is all over the place (19.7-56ns). In Win 7 the latency is 19.5 - 20.2ns, regardless of the settings used.
Same setup, same version of AIDA (newest beta from 26th).
Would you be able to run that draw call benchmark in Win7, with a couple other tweaks?

Apparently there's been motherboard BIOS updates that have improved performance (faster DDR4 speeds), as well as disabling SMT also giving better performance. Would be good to see if there's a cumulative effect that boosts the draw call perf.
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
Would you be able to run that draw call benchmark in Win7, with a couple other tweaks?

Apparently there's been motherboard BIOS updates that have improved performance (faster DDR4 speeds), as well as disabling SMT also giving better performance. Would be good to see if there's a cumulative effect that boosts the draw call perf.
The most recent bioses don't have any low level changes compared to the one I was using.
I'll try it under Win 7.
 

iBoMbY

Member
Nov 23, 2016
175
103
86
Okay, I just tried my bcdedit settings on my old intel (with 4 instead of 8), and the results in Coreinfo are pretty interesting:

Code:
Logical to Physical Processor Map:
Physical Processor 0 (Hyperthreaded):
**--
----
Physical Processor 1 (Hyperthreaded):
--**
----
Physical Processor 2 (Hyperthreaded):
----
**--
Physical Processor 3 (Hyperthreaded):
----
--**

Logical Processor to Socket Map:
Socket 0:
****
----
Socket 1:
----
****

Logical Processor to NUMA Node Map:
NUMA Node 0:
****
----
NUMA Node 1:
----
****
Calculating Cross-NUMA Node Access Cost...
                                       
Approximate Cross-NUMA Node Access Cost (relative to fastest):
     00  01
00: 1.5 1.1
01: 1.0 1.0

Logical Processor to Cache Map:
Data Cache          0, Level 1,   32 KB, Assoc   8, LineSize  64
**--
----
Instruction Cache   0, Level 1,   32 KB, Assoc   8, LineSize  64
**--
----
Unified Cache       0, Level 2,  256 KB, Assoc   8, LineSize  64
**--
----
Unified Cache       1, Level 3,    8 MB, Assoc  16, LineSize  64
****
----
Data Cache          1, Level 1,   32 KB, Assoc   8, LineSize  64
--**
----
Instruction Cache   1, Level 1,   32 KB, Assoc   8, LineSize  64
--**
----
Unified Cache       2, Level 2,  256 KB, Assoc   8, LineSize  64
--**
----
Data Cache          2, Level 1,   32 KB, Assoc   8, LineSize  64
----
**--
Instruction Cache   2, Level 1,   32 KB, Assoc   8, LineSize  64
----
**--
Unified Cache       3, Level 2,  256 KB, Assoc   8, LineSize  64
----
**--
Unified Cache       4, Level 3,    8 MB, Assoc  16, LineSize  64
----
****
Data Cache          3, Level 1,   32 KB, Assoc   8, LineSize  64
----
--**
Instruction Cache   3, Level 1,   32 KB, Assoc   8, LineSize  64
----
--**
Unified Cache       5, Level 2,  256 KB, Assoc   8, LineSize  64
----
--**

Logical Processor to Group Map:
Group 0:
****
----
Group 1:
----
****

It also has an influence on the Cache mapping, for one, and the system looks pretty much like it has two physical processors now. You also get a new option in the task manager, to view the CPU load per NUMA node. So, it should have some influence on how Windows handles stuff ...

Edit: The cache size is wrong now, but it is possible to set the L2 and L3 cache size via some registry settings, I'll try that now.

Edit2: The cache setting in the registry doesn't change anything. The bcdedit settings can be removed with:

Code:
bcdedit.exe /deletevalue groupsize
bcdedit.exe /deletevalue groupaware
After rebooting everything looks like normal. Doesn't look like the group settings cause any instability (as long as you set it to dividable by 2). The Windows scheduling is definitely influenced.
 
Last edited:

bjt2

Senior member
Sep 11, 2016
784
180
86
In theory it should be safe, and could be reversed with "bcdedit /deletevalue [name]". The settings only apply after rebooting. But yes, in theory this could lead to the system not booting, so a backup (or at least a restore point) would be advisable.
restore point AFAIK does not restore BCD. A bootable USB or DVD is needed. if don't boot, use usb or dvd, repair menu, command prompt and bcdedit to restore...
 
Status
Not open for further replies.

ASK THE COMMUNITY