Possible Zen 2 Thread Ripper sightings

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
This would mean 8 dual-core CCXs. This is interesting for running many concurrent processes which do not communicate much, or at all. Multithreaded applications with appreciable inter-thread synchronization needs may fare better on processors with fully enabled (that is, 4-core) CCXs. That's my guess at least.
That will not be the case due to the IO doing the CCX comunications, at least we don't see any penalties with the 3900X wich has two CCD with 6 core each(4 CCx nodes with 3 cores each)
 

StefanR5R

Elite Member
Dec 10, 2016
5,643
8,117
136
@nicalandia, my point is not the number of CCDs per processor*, but the number of cores per CCX**. The latter will without doubt determine how many threads per process a data-intensive computational program (with data dependency between the threads) should have for optimum throughput.

Published measurements have shown inter-CCX communication and inter-CCD communication perform virtually identically, and perform the same as or only somewhat better than memory accesses (*). This is in contrast to inter-core communication within the same CCX. That's the whole reason to organize cores in CCXs in the first place. (**)
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
@nicalandia, my point is not the number of CCDs per processor*, but the number of cores per CCX**. The latter will without doubt determine how many threads per process a data-intensive computational program (with data dependency between the threads) should have for optimum throughput.

Published measurements have shown inter-CCX communication and inter-CCD communication perform virtually identically, and perform the same as or only somewhat better than memory accesses (*). This is in contrast to inter-core communication within the same CCX. That's the whole reason to organize cores in CCXs in the first place. (**)
but with zen 2, its doesn't matter where the core is, each core will have go trough IO die to communicate.
 

Kedas

Senior member
Dec 6, 2018
355
339
136
but with zen 2, its doesn't matter where the core is, each core will have go trough IO die to communicate.
Each CCX has to go through the IO die.

I wonder what the speed difference will be between a TR 16 core 4 channel and 3950X (2 channel) in games.
I don't see an obvious disadvantage for TR but who knows...
It will have a higher idle power I assume.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Published measurements have shown inter-CCX communication and inter-CCD communication perform virtually identically, and perform the same as or only somewhat better than memory accesses (*). This is in contrast to inter-core communication within the same CCX. That's the whole reason to organize cores in CCXs in the first place.
So far the 3900X has not been crippled against the 3800X and still has 12-cores in a 3+3+3+3 core configuration so a 2+2+2+2 will have little if any penalties compared to 4+4+ due to the I0 handling the communications. Yes it's not optimal but those 8 core Epycs are meant for system that are PCI heavy and not throughput so I doubt that we will see them on Thread Ripper as I believe the last 8 core ThreadRipper was the 1900X
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,643
8,117
136
Sure, if you have little synchronization, there are little penalties for spreading an application over multiple CCXs. Yet there are applications with a good amount of sharing of hot data between threads, such as fast fourier transforms. It will be interesting to measure those on 4-/ 3-/ 2-core CCX SKUs.
 

nicalandia

Diamond Member
Jan 10, 2019
3,331
5,282
136
Sure, if you have little synchronization, there are little penalties for spreading an application over multiple CCXs. Yet there are applications with a good amount of sharing of hot data between threads, such as fast fourier transforms. It will be interesting to measure those on 4-/ 3-/ 2-core CCX SKUs.
There was a review that turn "Game" mode On for 3900X(the mode was initially created for ThreadRipper it basically turn off all but one compute unit 8 core) that basically turned the 3900X to a 3600X(3+3 single CCD) and in 99% of the games it actually got worst performance using "Game" mode, I would guess that the lost of performance was due to lower total cache and not due to 3+3 vs 3+3+3+3 communication penalties.

https://www.legitreviews.com/game-mode-might-boost-performance-on-amd-ryzen-3900x-processors_213087

"When we ran other game titles with ‘Game Mode’ enabled on the 3900X we found that performance dropped. We only ran Rainbow Six: Siege, Far Cry 5 and Metro Exodus in our CPU test suite and of those titles only Metro Exodus showed a performance improvement with Game Mode enabled. It is likely that 99% of game titles run better in ‘Creator Mode’ and we hope that is the case. Now that this news is out the sites that have dozens of games that can be tested with automation will be able to crank out results in a timely manner from a larger data set."
 

StefanR5R

Elite Member
Dec 10, 2016
5,643
8,117
136
My posts were about Xs, not about Ds. ;-)


(Edit, also, about work, not play.)
 
Last edited:

StefanR5R

Elite Member
Dec 10, 2016
5,643
8,117
136
Are there any downloadable benchmarks that utilize fast fourier transforms?
Prime95 for example. It is using the gwnum library for the heavy lifting and spends most of the processor time in FFT. The FFT size can be configured, and (times 8 for size in bytes) determines the size of the hot data that should be kept in a processor cache shared by all cores used by one worker process.

(I've been using the FFTW library in structural dynamics myself. I don't recall a ready made benchmark for FFTW; maybe there is one; but those who use FFTW should have little trouble to benchmark for their own use case. I have also been using a CFD program which heavily relies on FFT; it's been a while so I don't recall whether it used a home-grown implementation or a library which would enable standalone benchmarks.)
 

Shmee

Memory & Storage, Graphics Cards Mod Elite Member
Super Moderator
Sep 13, 2008
7,498
2,520
146
It certainly will be interesting to see if HEDT with Zen 2 TR makes a comeback in gaming. That would be cool after the dissapointing gaming performance of X299 and X399. My last HEDT board was the Asus X99 Deluxe with a 5930k. Recently I have moved on to AM4.
 

DrMrLordX

Lifer
Apr 27, 2000
21,748
11,070
136
Prime95 for example.

That's what I thought. I wasn't entirely certain though, so you've confirmed my suspicion that FFT = fast fourier transform. Might be able to use it to measure interthread communication performance on various Zen2 chips.