Discussion ZEN2 2-CCD chip inter core latencies = EPYC advances in server rooms

JoeRambo

Golden Member
Jun 13, 2013
1,814
2,105
136
When reading overclock.net and Stilt's take on ZEN2, i have found this little gem in the thread, coming from Russian site https://3dnews.ru/990367/obzor-amd-ryzen-9-3900x

https://3dnews.ru/assets/external/illustrations/2019/07/08/990367/image_2019-07-08_12-51-48.png

It might be DDR4 3600 lowering overall levels of latency, but those inter-CCD latencies are going to be amazing for server CPUs and will lead to quite a few wins. No more hidden gotchas when moving between CCXs and further, just flat sweet sub ~70ns for all inter thread comms.

Not only AMD will definitely rule those 1-2S system, but it seems that they will do so with amazing technical grace.
 

TheGiant

Senior member
Jun 12, 2017
748
353
106
When reading overclock.net and Stilt's take on ZEN2, i have found this little gem in the thread, coming from Russian site https://3dnews.ru/990367/obzor-amd-ryzen-9-3900x

https://3dnews.ru/assets/external/illustrations/2019/07/08/990367/image_2019-07-08_12-51-48.png

It might be DDR4 3600 lowering overall levels of latency, but those inter-CCD latencies are going to be amazing for server CPUs and will lead to quite a few wins. No more hidden gotchas when moving between CCXs and further, just flat sweet sub ~70ns for all inter thread comms.

Not only AMD will definitely rule those 1-2S system, but it seems that they will do so with amazing technical grace.
lets wait for database benchmarks
looks very good
 

zir_blazer

Golden Member
Jun 6, 2013
1,164
406
136
I was surprised that the two CCX in a CPU chiplet are not interconnected together but instead have to go though the IO chiplet. It seems to be a mix between Pentium D Smithfield, which was two adjacent Prescott dies cut together but with no internal communication, thus they had to go though the Chipset to talk to each other, and a Core 2 Quad Kentsfield, which was a MCM based on two Conroe dies where each Conroe saw the other core and shared the Cache L2, but to talk to the other Conroe die it had to go though the Chipset, too.
At first it doesn't make sense since they have twice as many traces between a CPU chiplet and the IO one than if there was some crossbar in the CPU chiplet, plus the chiplet interCCX latency is increased. But somehow is far more elegant, as having two latency tiers is simpler that having three from dumb CPU Schedulers standpoint. There is technically no penalty from going from one chiplet to two or more in that regard.

Where you will notice this the most? ThreadRipper and EPYC, obviously.
 
Last edited: