HurleyBird
Platinum Member
- Apr 22, 2003
- 2,670
- 1,250
- 136
Now in Zen2 case the thread is limited to filling up to one of the 16MB L3 units.
In Zen3 case we have the thread limited to filling a 32MB L3 cache.
This means potentially significantly greater hit rate (though at the supposed cost of almost 20% latency hit).
Point is, you can replace "a thread" with "two threads," "three threads," or however many threads. The number of threads by itself doesn't make a difference. What does is the extent to which datasets fits into 16 MB L3, and the extent to which data is specific to individual threads vs. shared between them.
A hypothetical single threaded task that can consume the entire 32 MB L3 will benefit.
A hypothetical task that consumes all 16 threads in a chiplet and fills the entire 32MB L3 with shared data will benefit extremely, even more than the prior example.
A hypothetical single threaded task that fits entirely inside a 16 MB L3 will regress.
A program that creates two processes that each consume 8 threads and 16 MB (eg. perfectly in-line with the Zen 2 CCX structure) will regress extremely, even more than the former example.
A significant majority of tasks will benefit, both single and multi-threaded. But some minority of both single threaded and multi-threaded tasks will regress. To say that "Such a tradeoff would have an advantage pretty much only for single threaded loads" is entirely misleading and seems to misunderstand how things work, not to mention that there are plenty of database workloads that would profusely disagree with that statement.
Last edited: