Discussion Zen 5 Architecture & Technical discussion

Page 22 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Covfefe

Member
Jul 23, 2025
87
149
66
Copying and pasting my comment from the Intel thread (oops).

AMD uses "statically partitioned" to refer to a resource that can never be used 100% by a single thread.

AMD uses "competitively shared" to mean that a single thread can use 100% of a resource, but when SMT is active the two threads have to split it.

AMD calls the 50/50 split (when SMT is active) for the micro-op cache competitive sharing, not static partitioning.
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,688
2,581
136
With that scheme there's two timing path whether code is shared or not with other cpu thread(process). Ok, code timing side-channels aren't as bad as data but making one for no gains is still unnecessary risk.
The only situation when this happens is if the same physical page was mapped into both processes. This makes it visible through other cache levels too (all caches above L1 are always physically tagged, and most L1 caches are physically tagged too).Having the opcache physically tagged doesn't leak anything more.

The solution is not to do that then, if the knowledge of what pages a different process has resident in ram is unacceptable. Use a container or something and provide a separate copy of all your binaries, that way none of them will be shared with other processes.
 
  • Like
Reactions: booklib28 and MS_AT

MS_AT

Senior member
Jul 15, 2024
921
1,846
96
AMD uses "statically partitioned" to refer to a resource that can never be used 100% by a single thread.

AMD uses "competitively shared" to mean that a single thread can use 100% of a resource, but when SMT is active the two threads have to split it.

AMD calls the 50/50 split (when SMT is active) for the micro-op cache competitive sharing, not static partitioning.
This is how they define it in Software Optimization Guide:
These categories are defined as:
• Competitively Shared: Resource entries are assigned on demand. A thread may use all resource
entries.
• Watermarked: Resource entries are assigned on demand. When in two-threaded mode a thread
may not use more resource entries than are specified by a watermark threshold.
• Statically Partitioned: Resource entries are partitioned when entering two-threaded mode. A
thread may not use more resource entries than are available in its partition.
what makes the distinction between the last two not exactly clear to me.
Op-cache is supposedly shared by physical tags - those gains aren't there because hits are only shared after tlb with simultaneous l1i and op-cache scans. Both threads would only perform optimally with their own op-cache hits.
Why? Both threads are running the same loop, that is mapped to the same virtual memory of the process and backed up by the same physical address, as long it is in this loop it should hit quite nicely. Of course I cannot say when they resolve the translation but at least part of the front end is powered down, so I doubt they are checking against L1i in parallel.
When instruction fetch misses in the Op Cache, and instructions are decoded after being read from
the instruction cache (IC), they are also built into the Op Cache. (...)
The Op Cache is modal, and the processor can only transition between instruction cache mode (IC
mode) and Op Cache mode (OC mode) at certain points. Instruction cache to Op Cache transitions
can only happen at taken branches. The processor remains in Op Cache mode until an Op Cache
miss is detected.
Excessive transitions between instruction cache and Op Cache mode may impact performance
negatively.
 
  • Like
Reactions: igor_kavinski

Anacapols

Junior Member
Mar 2, 2025
9
20
41
what makes the distinction between the last two not exactly clear to me.
To me it reads like:
- statically partitioned enforces a 50/50 split in 2t mode.
- watermarked allows a skewed split (like 30/70) up to some point (neither thread can use more than x% for some 50 < x < 100).
- Competitively shared allows all the way up to 100/0 if one thread is not making use of those resources, potentially making the thread wait for those to free up if it needs them.

this puts watermarked as a midway point between the two, allowing smt to more effectively utilize resources without either thread ever getting "stuck" / completely missing required resources.
 
  • Like
Reactions: igor_kavinski
Jul 27, 2020
28,173
19,204
146
Yeah, I'm thinking that the watermark threshold is dynamically determined based on historical usage of the resources by both threads and if one thread has "suffered" for some period of time due to reduced access to resources, the other thread may get reduced resource allocation for a period of time too to allow the starved thread to get some work done. It's probably an automatic balancing mechanism meant to ensure a more fair allocation of resources than competitively shared ones. I suppose the different categories are defined based on the importance of resources like so:

Statically partitioned resources: Threads absolutely cannot work without having exclusive access to these resources

Competitively shared: A thread may use as much of a resource as possible to get through its execution quickly enough. Time spent using these resources is not enough to impact the other thread too adversely even if some waiting time is incurred.

Watermarked: These resources are less precious than static ones but more valuable than competitively shared ones so their allocation needs to be managed fairly.
 

StefanR5R

Elite Member
Dec 10, 2016
6,824
10,922
136
Statically partitioned resources: Threads absolutely cannot work without having exclusive access to these resources
Or: it's just not worth the effort to implement sharing; simply give exclusive copies of the hardware to each thread.

Competitively shared: A thread may use as much of a resource as possible to get through its execution quickly enough. Time spent using these resources is not enough to impact the other thread too adversely even if some waiting time is incurred.
I am guessing that some non-trivial quality-of-service policy / fairness policy is still involved in competitive sharing.

Watermarked: These resources are less precious than static ones but more valuable than competitively shared ones so their allocation needs to be managed fairly.
IOW, an exclusive little emergency partition of the resource is reserved to each thread. Surely due to latency considerations.

this puts watermarked as a midway point between the two,
Or another way of looking at it:
"Statically partitioned" is an edge case of "watermarked" in which each thread gets a watermark of 50%.
"Competitively shared" is an edge case of "watermarked" in which each thread gets a watermark of 100%.

I'm thinking that the watermark threshold is dynamically determined based on historical usage of the resources by both threads
My guess is that the watermarks are static. But I may be wrong about this of course.
 

naukkis

Golden Member
Jun 5, 2002
1,030
854
136
Why? Both threads are running the same loop, that is mapped to the same virtual memory of the process and backed up by the same physical address, as long it is in this loop it should hit quite nicely. Of course I cannot say when they resolve the translation but at least part of the front end is powered down, so I doubt they are checking against L1i in parallel.
Op-cache ain't partitioned as cache lines but as instruction sequences. With full trace-cache there could be multiple cache hits for same instruction location from different calling points so I don't even understand how they could possibly use physical mapping. Op-cache is split per thread for reason - even when running in same address space hits between threads are forbidded because they are unvalid. Probably that AMD document is just wrong about op-cache - or AMD is using wildly different kind of op-cache than those documented.
 

MS_AT

Senior member
Jul 15, 2024
921
1,846
96
Op-cache ain't partitioned as cache lines but as instruction sequences. With full trace-cache there could be multiple cache hits for same instruction location from different calling points so I don't even understand how they could possibly use physical mapping. Op-cache is split per thread for reason - even when running in same address space hits between threads are forbidded because they are unvalid. Probably that AMD document is just wrong about op-cache - or AMD is using wildly different kind of op-cache than those documented.
Sorry I think they are at least somewhat competent and would not put something completly misleading. So unless you are able to demonstrate that the description is misleading I keep treating it as accurate.

Btw, the icache is also PIPT so it matches u-op cache.

IIRC the a single uop entry can cover at most two consecutive cachelines according to the manual.
 
  • Like
Reactions: Tlh97

naukkis

Golden Member
Jun 5, 2002
1,030
854
136
Btw, the icache is also PIPT so it matches u-op cache.

From AMD documentation we know that icache is also accessed with linear address utag. So not only op-cache but one level after it - icache is accessed without physical address translation - there ain't any performance to be found doing physical tag search from op-cache - it is need to be done if L1i isn't inclusive of op-cache - but as op-cache ain't either cache-line based or even hold same data that other cache levels it seems really unlikely.