Dynamic Local Mode for Threadripper

Vattila

Senior member
Oct 22, 2004
812
1,429
136
Dynamic Local Mode — NUMA-aware optimisation for Threadripper 2990WX and 2970WX coming soon:

"Dynamic Local Mode is a new piece of software that automatically migrates the system’s most demanding application threads onto the [CPU cores] with local memory access. In other words: the apps that prefer local DRAM access will automatically receive it, and apps that scale to many cores will be free to do so."

community.amd.com

pastedImage_0.png
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
26,467
15,580
136
OK, does this include all applications ? Even one that use more than 32 threads ?
 
  • Like
Reactions: Drazick

Vattila

Senior member
Oct 22, 2004
812
1,429
136
OK, does this include all applications ? Even one that use more than 32 threads ?

Apparently it is a background service that moves threads around so that the heaviest threads run on the CPUs closest to memory. For throughput applications that are not memory latency sensitive and scale well to high thread counts, I guess you are better off putting the CPU in UMA mode, in which the memory is interleaved across all controllers (near and far), thus making full use of the bandwidth potential.
 

StefanR5R

Elite Member
Dec 10, 2016
6,123
9,239
136
It is an add-on for the Windows 10 operating system, specifically.
Call it an optimization, or maybe call it a fix or workaround...?

(edit)
AMD said:
Just to be clear, Dynamic Local Mode is a new feature for the AMD Ryzen™ Threadripper™ 2990WX and 2970WX processors. Only these AMD Ryzen™ Threadripper™ processors have a mixed memory access design wherein some dies have direct memory access, while others access memory across the Infinity Fabric.
I still wonder how this is supposed to make a difference if the OS's kernel isn't severely broken. From the POV of a current memory allocation, the two indirectly connected dies are simply like NUMA nodes whose local memory has already been allocated completely.
 
Last edited:
  • Like
Reactions: prtskg

Charlie22911

Senior member
Mar 19, 2005
614
228
116
I wonder if this is going to be similar to modern multi-GPU implementations where each application will need a profile to map it to the appropriate cores? If so it’s not ideal, but it is a positive change regardless.
 

StefanR5R

Elite Member
Dec 10, 2016
6,123
9,239
136
AMD's tool bases its decisions on run-time measurements, not on hardwired profiles, from what I understand.
 
  • Like
Reactions: lightmanek

DrMrLordX

Lifer
Apr 27, 2000
22,256
11,987
136
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
 

BigDaveX

Senior member
Jun 12, 2014
440
216
116
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
Wouldn't surprise me. Linux has had NUMA support since the days of the original Opteron. Windows wasn't even NUMA-aware at all until Vista, and still seems to have trouble with anything more than two NUMA nodes.
 

moinmoin

Diamond Member
Jun 1, 2017
5,151
8,249
136
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
Under Linux the scheduler is hardware aware by default so stuff like 32c Threadripper were never an issue to begin with. Meanwhile under Windows everybody had to pretend the scheduler works fine and optimization had to be added through additional software/services since Microsoft couldn't be bothered to plan ahead on system level.