I did a quick test of all "b" and "min n in progress"/ "max n in progress"/ "max n loaded" which are currently listed at Generalized Cullen Woodall Prime Search statistics:
(All of the tests used "zero-padded FMA3 FFT" on a Haswell CPU.)
Well, this is awkward for Zen 2 users. Some of the currently sent work still fits into the L3 cache of a single CCX (and would perform best if tied to the cores of one and the same CCX), while other currently sent work already exceeds the L3 cache of one CCX. — I suppose a good course of action would be to test both smaller and larger work, and then decide whether to proceed with 1 task : 1 CCX or with 1 task : 2 CCXs for all of the upcoming work.
b | n | FMA3 FFT length | FMA3 FFT size |
---|---|---|---|
13 | 4234840/ 4330158/ 4343034 | 1920K | 15 MB |
29 | 3259942/ 3298500/ 3308112 | 2016K | 15.75 MB |
47 | 2850640/ 2884836/ 2893336 | 2M/ 2304K/ 2304K | 16 MB/ 18 MB/ 18 MB |
49 | 2817958/ 2853898/ 2862348 | 1792K | 14 MB |
55 | 2707792/ 2771662/ 2779828 | 2M/ 2304K/ 2304K | 16 MB/ 18 MB/ 18 MB |
69 | 2581852/ 2623208/ 2630906 | 2304K | 18 MB |
101 | 2379576/ 2406490/ 2413702 | 2304K | 18 MB |
109 | 2323728/ 2367570/ 2374138 | 2304K | 18 MB |
121 | 2290840/ 2316018/ 2322678 | 1920K | 15 MB |
(All of the tests used "zero-padded FMA3 FFT" on a Haswell CPU.)
Well, this is awkward for Zen 2 users. Some of the currently sent work still fits into the L3 cache of a single CCX (and would perform best if tied to the cores of one and the same CCX), while other currently sent work already exceeds the L3 cache of one CCX. — I suppose a good course of action would be to test both smaller and larger work, and then decide whether to proceed with 1 task : 1 CCX or with 1 task : 2 CCXs for all of the upcoming work.