• We’re currently investigating an issue related to the forum theme and styling that is impacting page layout and visual formatting. The problem has been identified, and we are actively working on a resolution. There is no impact to user data or functionality, this is strictly a front-end display issue. We’ll post an update once the fix has been deployed. Thanks for your patience while we get this sorted.

Dynamic Local Mode for Threadripper

Vattila

Senior member
Dynamic Local Mode — NUMA-aware optimisation for Threadripper 2990WX and 2970WX coming soon:

"Dynamic Local Mode is a new piece of software that automatically migrates the system’s most demanding application threads onto the [CPU cores] with local memory access. In other words: the apps that prefer local DRAM access will automatically receive it, and apps that scale to many cores will be free to do so."

community.amd.com

pastedImage_0.png
 
OK, does this include all applications ? Even one that use more than 32 threads ?

Apparently it is a background service that moves threads around so that the heaviest threads run on the CPUs closest to memory. For throughput applications that are not memory latency sensitive and scale well to high thread counts, I guess you are better off putting the CPU in UMA mode, in which the memory is interleaved across all controllers (near and far), thus making full use of the bandwidth potential.
 
It is an add-on for the Windows 10 operating system, specifically.
Call it an optimization, or maybe call it a fix or workaround...?

(edit)
AMD said:
Just to be clear, Dynamic Local Mode is a new feature for the AMD Ryzen™ Threadripper™ 2990WX and 2970WX processors. Only these AMD Ryzen™ Threadripper™ processors have a mixed memory access design wherein some dies have direct memory access, while others access memory across the Infinity Fabric.
I still wonder how this is supposed to make a difference if the OS's kernel isn't severely broken. From the POV of a current memory allocation, the two indirectly connected dies are simply like NUMA nodes whose local memory has already been allocated completely.
 
Last edited:
I wonder if this is going to be similar to modern multi-GPU implementations where each application will need a profile to map it to the appropriate cores? If so it’s not ideal, but it is a positive change regardless.
 
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
 
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
Wouldn't surprise me. Linux has had NUMA support since the days of the original Opteron. Windows wasn't even NUMA-aware at all until Vista, and still seems to have trouble with anything more than two NUMA nodes.
 
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
Under Linux the scheduler is hardware aware by default so stuff like 32c Threadripper were never an issue to begin with. Meanwhile under Windows everybody had to pretend the scheduler works fine and optimization had to be added through additional software/services since Microsoft couldn't be bothered to plan ahead on system level.
 
Back
Top