@nicalandia, my point is not the number of CCDs per processor*, but the number of cores per CCX**. The latter will without doubt determine how many threads per process a data-intensive computational program (with data dependency between the threads) should have for optimum throughput.
Published measurements have shown inter-CCX communication and inter-CCD communication perform virtually identically, and perform the same as or only somewhat better than memory accesses (*). This is in contrast to inter-core communication within the same CCX. That's the whole reason to organize cores in CCXs in the first place. (**)