Based on older tests of llrPSP, I set my dual-7452 (that is, 2 × 8 × 4c/8t-CCXs, 16 MB L3$/CCX) to run 8 tasks in parallel. Each task is 8-threaded (that is, all tasks in total use only half of the SMT threads) and is pinned to half of the logical CPUs of 2 dedicated CCXs (pinned such that SMT siblings are not used). Having to cross one CCX border is a necessary evil on Zen 2. But at least the task pinning prevents that randomly many CCX boundaries are crossed by the tasks. — That way, and with 180 W cTDP and PPT per socket, task durations are 70,000…76,000 seconds (19.5…21 hours).
I am still implementing the pinning to logical CPUs by means of multiple boinc client instances. That is, I have got 8 client instances running, each client process pinned to 2 dedicated CCXs, and running only a single task at once. The task inherits the CPU affinity of the client, of course.
A more convenient implementation for dual-socket computers would be to either
– run 2 client instances (one per socket, pinned to the logical CPUs of a dedicated socket), and have an external program (can be written in a scripting language) narrow down the CPU affinity of science tasks further, or
– run a single client instance, built from modified boinc source code which is extended to apply CPU affinity to science tasks. (Though in the present case, only the LLR subprocess needs CPU affinity, not its PrimeGrid wrapper process. I haven't looked up if source code of the wrapper is available.)
Since the largest dual-socket computer type of mine has got nor more than 64c/128t, and I am often short of spare time, I haven't taken the time yet to implement one of the more convenient methods. That said, configuring and starting 8 client instances on a computer isn't a big deal either, when it has been done before, i.e. 99% of the config is already there.