what does everyone thing that best number of threads per job ?
Short answer: Go with singlethreaded jobs, and run only as many jobs on a host as there are real cores (not SMT threads).
Long answer:
If you wanted to optimize for *speed*, then ≈8 per job, and only as many jobs at once per host so that SMT threads remain unused, and set CPU affinities such that each job is confined to a single last-level cache domain.
But if you want to optimize for *throughput* — and this is what most want for a credits-based challenge like these ones —, then the first question to answer is how many jobs at once per host to run. Or in case of AMD Zen: How many jobs at once per CCX.
The following is for desktop and server AMD Zen (dense server excluded, 3D V-cache variants excluded, variants with disabled cores excluded) :
Zen 1...2 have 4c/8t CCXs with 16 MB L3$/CCX. 16,384 kB/CCX / 2,304 kB/job ≈ 7 jobs/CCX.
Round this down to 4 jobs/CCX and test whether 1-threaded jobs or 2-threaded jobs give higher throughput.
Zen 3...5 have 8c/16t CCXs with 32 MB L3$/CCX. 32,768 kB/CCX / 2,304 kB/job ≈ 14 jobs/CCX.
Round this down to 8 jobs/CCX and test whether 1-threaded jobs or 2-threaded jobs give higher throughput.
The likely outcome of appropriately precise throughput tests is that either 1-threaded jobs work slightly better, or that 2-threaded jobs work a tiny bit better but at the cost considerably more energy per task. A big benefit of 1-threaded jobs is that you don't have to assign CPU affinity, at least not on Linux. 2-threaded jobs need CPU affinity set such that both threads of a job run on the same CCX, otherwise throughput suffers.
For a throughput test, the PRST exe and a suitable number of the (n!/n#)±1 form are required. One way to get the latter is to wait until September 10 and then take the candidate number of a real workunit.
PS, about speed vs. throughput: The PrimeGrid page which I cited gave an estimate of 22 minutes CPU time on average CPUs, which is short. Also, the "fast proof" verification scheme is enabled in this project = "everyone is first" (every prime finder will be credited as first finder). These are more reasons why one should not bother with optimization for speed, but should rather optimize for throughput.
PPS, the PrimeGrid page which I cited said "Multi-threading is supported and IS recommended". I believe this recommendation is mostly for CPUs with less cache per core (≤ 2 MB last level cache per core, notably), e.g. various Intel desktop and mobile CPUs and several AMD mobile CPUs. Such CPUs can only run fewer jobs at once before they run into a serious memory bandwidth bottleneck. With fewer jobs at once, they should engage respectively more cores per task, in order to leave no core unused.