Discussion Zen 5 Architecture & Technical discussion

Covfefe · Sep 3, 2025

Copying and pasting my comment from the Intel thread (oops).

AMD uses "statically partitioned" to refer to a resource that can never be used 100% by a single thread.

AMD uses "competitively shared" to mean that a single thread can use 100% of a resource, but when SMT is active the two threads have to split it.

AMD calls the 50/50 split (when SMT is active) for the micro-op cache competitive sharing, not static partitioning.

Tuna-Fish · Sep 3, 2025

naukkis said:
With that scheme there's two timing path whether code is shared or not with other cpu thread(process). Ok, code timing side-channels aren't as bad as data but making one for no gains is still unnecessary risk.

The only situation when this happens is if the same physical page was mapped into both processes. This makes it visible through other cache levels too (all caches above L1 are always physically tagged, and most L1 caches are physically tagged too).Having the opcache physically tagged doesn't leak anything more.

The solution is not to do that then, if the knowledge of what pages a different process has resident in ram is unacceptable. Use a container or something and provide a separate copy of all your binaries, that way none of them will be shared with other processes.

MS_AT · Sep 4, 2025

Covfefe said:
AMD uses "statically partitioned" to refer to a resource that can never be used 100% by a single thread.

AMD uses "competitively shared" to mean that a single thread can use 100% of a resource, but when SMT is active the two threads have to split it.

AMD calls the 50/50 split (when SMT is active) for the micro-op cache competitive sharing, not static partitioning.

This is how they define it in Software Optimization Guide:

These categories are defined as:
• Competitively Shared: Resource entries are assigned on demand. A thread may use all resource
entries.
• Watermarked: Resource entries are assigned on demand. When in two-threaded mode a thread
may not use more resource entries than are specified by a watermark threshold.
• Statically Partitioned: Resource entries are partitioned when entering two-threaded mode. A
thread may not use more resource entries than are available in its partition.

what makes the distinction between the last two not exactly clear to me.

naukkis said:
Op-cache is supposedly shared by physical tags - those gains aren't there because hits are only shared after tlb with simultaneous l1i and op-cache scans. Both threads would only perform optimally with their own op-cache hits.

Why? Both threads are running the same loop, that is mapped to the same virtual memory of the process and backed up by the same physical address, as long it is in this loop it should hit quite nicely. Of course I cannot say when they resolve the translation but at least part of the front end is powered down, so I doubt they are checking against L1i in parallel.

When instruction fetch misses in the Op Cache, and instructions are decoded after being read from
the instruction cache (IC), they are also built into the Op Cache. (...)
The Op Cache is modal, and the processor can only transition between instruction cache mode (IC
mode) and Op Cache mode (OC mode) at certain points. Instruction cache to Op Cache transitions
can only happen at taken branches. The processor remains in Op Cache mode until an Op Cache
miss is detected.
Excessive transitions between instruction cache and Op Cache mode may impact performance
negatively.

Anacapols · Sep 4, 2025

MS_AT said:
what makes the distinction between the last two not exactly clear to me.

To me it reads like:
- statically partitioned enforces a 50/50 split in 2t mode.
- watermarked allows a skewed split (like 30/70) up to some point (neither thread can use more than x% for some 50 < x < 100).
- Competitively shared allows all the way up to 100/0 if one thread is not making use of those resources, potentially making the thread wait for those to free up if it needs them.

this puts watermarked as a midway point between the two, allowing smt to more effectively utilize resources without either thread ever getting "stuck" / completely missing required resources.

igor_kavinski · Sep 4, 2025

Yeah, I'm thinking that the watermark threshold is dynamically determined based on historical usage of the resources by both threads and if one thread has "suffered" for some period of time due to reduced access to resources, the other thread may get reduced resource allocation for a period of time too to allow the starved thread to get some work done. It's probably an automatic balancing mechanism meant to ensure a more fair allocation of resources than competitively shared ones. I suppose the different categories are defined based on the importance of resources like so:

Statically partitioned resources: Threads absolutely cannot work without having exclusive access to these resources

Competitively shared: A thread may use as much of a resource as possible to get through its execution quickly enough. Time spent using these resources is not enough to impact the other thread too adversely even if some waiting time is incurred.

Watermarked: These resources are less precious than static ones but more valuable than competitively shared ones so their allocation needs to be managed fairly.

StefanR5R · Sep 5, 2025

igor_kavinski said:
Statically partitioned resources: Threads absolutely cannot work without having exclusive access to these resources

Or: it's just not worth the effort to implement sharing; simply give exclusive copies of the hardware to each thread.

igor_kavinski said:
Competitively shared: A thread may use as much of a resource as possible to get through its execution quickly enough. Time spent using these resources is not enough to impact the other thread too adversely even if some waiting time is incurred.

I am guessing that some non-trivial quality-of-service policy / fairness policy is still involved in competitive sharing.

igor_kavinski said:
Watermarked: These resources are less precious than static ones but more valuable than competitively shared ones so their allocation needs to be managed fairly.

IOW, an exclusive little emergency partition of the resource is reserved to each thread. Surely due to latency considerations.

Anacapols said:
this puts watermarked as a midway point between the two,

Or another way of looking at it:
"Statically partitioned" is an edge case of "watermarked" in which each thread gets a watermark of 50%.
"Competitively shared" is an edge case of "watermarked" in which each thread gets a watermark of 100%.

igor_kavinski said:
I'm thinking that the watermark threshold is dynamically determined based on historical usage of the resources by both threads

My guess is that the watermarks are static. But I may be wrong about this of course.

igor_kavinski · Sep 5, 2025

Kepler_L2 said:
No, static partitioning is 50/50 split when SMT is active, competitively shared means each thread can use up to 100% of a resource even when SMT is active.

Any thoughts on the watermarked resources? How does that work?

igor_kavinski · Sep 5, 2025

Strix Halo technical doc: https://docs.amd.com/v/u/en-US/57930-A0-PUB_3.00

They still crammed a stupid NPU in there???

Very curious section:

Why would they have a ThreadsPerCore value defined (especially when SMT is enabled) if it is always supposed to be two threads? UNLESS, they plan to increase this value in future? 😛

Abwx · Sep 5, 2025

So far more than 30 mini PCs using STH.

Strix Halo ist angekommen: AMD bestätigt über 30 Mini-PCs und pusht weiter

Zur IFA 2025 hat AMD in einer Fragerunde den Erfolg von Strix Halo ins Rampenlicht gerückt. Anders als geplant steht der Mini-PC im Fokus.

www.computerbase.de

naukkis · Sep 20, 2025

MS_AT said:
Why? Both threads are running the same loop, that is mapped to the same virtual memory of the process and backed up by the same physical address, as long it is in this loop it should hit quite nicely. Of course I cannot say when they resolve the translation but at least part of the front end is powered down, so I doubt they are checking against L1i in parallel.

Op-cache ain't partitioned as cache lines but as instruction sequences. With full trace-cache there could be multiple cache hits for same instruction location from different calling points so I don't even understand how they could possibly use physical mapping. Op-cache is split per thread for reason - even when running in same address space hits between threads are forbidded because they are unvalid. Probably that AMD document is just wrong about op-cache - or AMD is using wildly different kind of op-cache than those documented.

MS_AT · Sep 20, 2025

naukkis said:
Op-cache ain't partitioned as cache lines but as instruction sequences. With full trace-cache there could be multiple cache hits for same instruction location from different calling points so I don't even understand how they could possibly use physical mapping. Op-cache is split per thread for reason - even when running in same address space hits between threads are forbidded because they are unvalid. Probably that AMD document is just wrong about op-cache - or AMD is using wildly different kind of op-cache than those documented.

Sorry I think they are at least somewhat competent and would not put something completly misleading. So unless you are able to demonstrate that the description is misleading I keep treating it as accurate.

Btw, the icache is also PIPT so it matches u-op cache.

IIRC the a single uop entry can cover at most two consecutive cachelines according to the manual.

naukkis · Sep 20, 2025

MS_AT said:
Btw, the icache is also PIPT so it matches u-op cache.

From AMD documentation we know that icache is also accessed with linear address utag. So not only op-cache but one level after it - icache is accessed without physical address translation - there ain't any performance to be found doing physical tag search from op-cache - it is need to be done if L1i isn't inclusive of op-cache - but as op-cache ain't either cache-line based or even hold same data that other cache levels it seems really unlikely.

Discussion Zen 5 Architecture & Technical discussion

Covfefe

Member

Tuna-Fish

Golden Member

MS_AT

Senior member

Anacapols

Junior Member

igor_kavinski

Lifer

StefanR5R

Elite Member

igor_kavinski

Lifer

igor_kavinski

Lifer

Abwx

Lifer

Strix Halo ist angekommen: AMD bestätigt über 30 Mini-PCs und pusht weiter

naukkis

Golden Member

MS_AT

Senior member

naukkis

Golden Member

TRENDING THREADS