Dynamic Local Mode for Threadripper

Vattila

Senior member
Oct 22, 2004
363
59
136
#1
Dynamic Local Mode — NUMA-aware optimisation for Threadripper 2990WX and 2970WX coming soon:

"Dynamic Local Mode is a new piece of software that automatically migrates the system’s most demanding application threads onto the [CPU cores] with local memory access. In other words: the apps that prefer local DRAM access will automatically receive it, and apps that scale to many cores will be free to do so."

community.amd.com

 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
17,257
660
136
#2
OK, does this include all applications ? Even one that use more than 32 threads ?
 

Vattila

Senior member
Oct 22, 2004
363
59
136
#3
OK, does this include all applications ? Even one that use more than 32 threads ?
Apparently it is a background service that moves threads around so that the heaviest threads run on the CPUs closest to memory. For throughput applications that are not memory latency sensitive and scale well to high thread counts, I guess you are better off putting the CPU in UMA mode, in which the memory is interleaved across all controllers (near and far), thus making full use of the bandwidth potential.
 

StefanR5R

Platinum Member
Dec 10, 2016
2,294
404
106
#4
It is an add-on for the Windows 10 operating system, specifically.
Call it an optimization, or maybe call it a fix or workaround...?

(edit)
AMD said:
Just to be clear, Dynamic Local Mode is a new feature for the AMD Ryzen™ Threadripper™ 2990WX and 2970WX processors. Only these AMD Ryzen™ Threadripper™ processors have a mixed memory access design wherein some dies have direct memory access, while others access memory across the Infinity Fabric.
I still wonder how this is supposed to make a difference if the OS's kernel isn't severely broken. From the POV of a current memory allocation, the two indirectly connected dies are simply like NUMA nodes whose local memory has already been allocated completely.
 
Last edited:

Charlie22911

Senior member
Mar 19, 2005
525
31
116
#5
I wonder if this is going to be similar to modern multi-GPU implementations where each application will need a profile to map it to the appropriate cores? If so it’s not ideal, but it is a positive change regardless.
 

StefanR5R

Platinum Member
Dec 10, 2016
2,294
404
106
#6
AMD's tool bases its decisions on run-time measurements, not on hardwired profiles, from what I understand.
 

kjboughton

Senior member
Dec 19, 2007
330
6
116
#8
But....but...NUMA doesn’t matter!
 
Apr 27, 2000
11,035
643
126
#9
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
 

BigDaveX

Senior member
Jun 12, 2014
321
35
101
#10
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
Wouldn't surprise me. Linux has had NUMA support since the days of the original Opteron. Windows wasn't even NUMA-aware at all until Vista, and still seems to have trouble with anything more than two NUMA nodes.
 

moinmoin

Senior member
Jun 1, 2017
668
185
96
#11
Huh. I wonder if the Linux scheduler has already been performing this sort of work out-of-the-box? Lots of people were getting better performance out Threadripper on Linux than in Win10.
Under Linux the scheduler is hardware aware by default so stuff like 32c Threadripper were never an issue to begin with. Meanwhile under Windows everybody had to pretend the scheduler works fine and optimization had to be added through additional software/services since Microsoft couldn't be bothered to plan ahead on system level.
 

ASK THE COMMUNITY

TRENDING THREADS