It is official. AMD announced and demonstrated Heavy Metal 32c/64t Threadripper 2. 7nm on the way.

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,520
136
CB is not RAM speed bounded, if a 1950X@3.4 score 3000 then a 32C@4.0 should be at 7000, FTR Zen+ has 3% better perf/hz than Zen in CB precisely.

Edit :

5-1080.2659924466.jpg


https://www.computerbase.de/2018-06/amd-ryzen-threadripper-2990x-screenshots-benchmarks/




That s within error margin :

https://www.computerbase.de/2018-06...ed-edition-cpu-test/2/#diagramm-cinebench-r15
But its not normal ram speed. Half of the CCX's leach memory, not direct access. Did you read that ?
 
  • Like
Reactions: Drazick

LightningZ71

Golden Member
Mar 10, 2017
1,628
1,898
136
But its not normal ram speed. Half of the CCX's leach memory, not direct access. Did you read that ?
OK, so they leech memory. If their working sets fit in their L3 cache in their CCX, and there is sufficient total DRAM bandwidth to supply all the cores with steady cache fills, the added memory access latency won't make a huge difference.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,520
136
AFAIK, that is still an unknown. Do you know for certain that this is true?
From what I read (not sure I remember the sources, check every reply in this thread, its there).
There is almost no other way without going 8 channel memory, which X399 can not support. Its an EPYC chip, slightly modified to work in TR4 instead of SP3 socket.
 
  • Like
Reactions: Drazick

Abwx

Lifer
Apr 2, 2011
10,953
3,474
136
But its not normal ram speed. Half of the CCX's leach memory, not direct access. Did you read that ?

The data to be computed fits in the cache, the computed data set is not big as well, there s huge amount of computing that is done around a small set of datas, that s a rendering, not an X264 encoding where it s the data set that is huge..

Beside you can check some tests here and there, CB is not sensitive to RAM speed above fairly low rates, that s an advantage of this FP bench.

maxresdefault.jpg


https://www.youtube.com/watch?v=S5LfrEK_szo
 
  • Like
Reactions: lightmanek

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,564
14,520
136
The data to be computed fits in the cache, the computed data set is not big as well, there s huge amount of computing that is done around a small set of datas, that s a rendering, not an X264 encoding where it s the data set that is huge..

Beside you can check some tests here and there, CB is not sensitive to RAM speed above fairly low rates, that s an advantage of this FP bench.

maxresdefault.jpg


https://www.youtube.com/watch?v=S5LfrEK_szo
You are not listening to anything I have said, so just wait until official benchmarks come out. Maybe they can explain it so you can understand
 
  • Like
Reactions: Drazick

maddie

Diamond Member
Jul 18, 2010
4,744
4,684
136
From what I read (not sure I remember the sources, check every reply in this thread, its there).
There is almost no other way without going 8 channel memory, which X399 can not support. Its an EPYC chip, slightly modified to work in TR4 instead of SP3 socket.
This is from a post in the thread. A few of us were discussing the chance of a 2 core leech OR 1 active memory controller / die instead of the normal 2. The Stilt did some quick memory test and got the results below, suggesting that rerouted traces in the package would be a superior solution. The whole of Pg4 of that thread has a lot of posts on the topic. Whether it happens or not, is another question.

https://forums.anandtech.com/thread...-top-tdp-of-250w.2547899/page-4#post-39451806

Post:
"Based on the quick synthetic tests the best option would be having 1CH per die, which of course would require active IMCs on each of the dies and totally reworked SP3r2 package (memory pad arrangement wise, which is very unlikely to happen).

1950X at fixed 3.4GHz frequency, 2933MHz MEMCLK CL14-14-14-1T.

3RA

2CPD (NUMA) = 85961MB/s (Read), 86643MB/s (Write), 81097MB/s (Copy), 78.33ns
1CPD (NUMA) = 44458MB/s (Read), 43449MB/s (Write), 40789MB/s (Copy), 78.80ns
2+0 CPD (LEECH) = 34495MB/s (Read), 37059MB/s (Write), 34823MB/s (Copy), 127.00ns"
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
This is from a post in the thread. A few of us were discussing the chance of a 2 core leech OR 1 active memory controller / die instead of the normal 2. The Stilt did some quick memory test and got the results below, suggesting that rerouted traces in the package would be a superior solution. The whole of Pg4 of that thread has a lot of posts on the topic. Whether it happens or not, is another question.

https://forums.anandtech.com/thread...-top-tdp-of-250w.2547899/page-4#post-39451806

Post:
"Based on the quick synthetic tests the best option would be having 1CH per die, which of course would require active IMCs on each of the dies and totally reworked SP3r2 package (memory pad arrangement wise, which is very unlikely to happen).

1950X at fixed 3.4GHz frequency, 2933MHz MEMCLK CL14-14-14-1T.

3RA

2CPD (NUMA) = 85961MB/s (Read), 86643MB/s (Write), 81097MB/s (Copy), 78.33ns
1CPD (NUMA) = 44458MB/s (Read), 43449MB/s (Write), 40789MB/s (Copy), 78.80ns
2+0 CPD (LEECH) = 34495MB/s (Read), 37059MB/s (Write), 34823MB/s (Copy), 127.00ns"

but it will make less than 32 thread workload suffer, and usually multi thread higher than 32 thread is not latency sensitive.

so in this case amd decision is right, when the workload is less than 2 dies, it will get highest performance, and maybe AMD will put highest bin dies in TR2 with active IMC so it will have maximum performance in less threaded application.
 

Abwx

Lifer
Apr 2, 2011
10,953
3,474
136
You are not listening to anything I have said, so just wait until official benchmarks come out. Maybe they can explain it so you can understand

It shouldnt be long before there s eventual better leaks, anyway i predict 7200-7300pts at 4.2GHz....
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I don't see how you can dispute that. The 2990x is a different animal. 16 of the cores do not have direct memory access. It is 4 2700x's, but the memory is gimped. So 4 2700x's are 7312, and this was 6300. So looks very doable to me.

2700X averages ~ 120W (Core + SoC power, 2933MHz MEMCLK) in CB R15 nT meaning the 2990X would need around 460W+ power limit to have the same per die power budget as the 2700X.
That already takes the lower SoC / PCI-E Phy power draw on two of the dies (MC) into account.

I expect the 250W TDP SKUs are configured in the same way as the AM4 Pinnacle Ridge parts are, where the actual power limit is >= 35% higher than the TDP (i.e. 337.5W).
Even still that will not allow even remotely the same per die power budget as on 2700X, or not even the 65W parts.

1402 seems low for a 8700k, even at base?

1400'ish is correct for 8700K, I've got 1436 with all of the parameters configured to meet Intel specs.
 

lightmanek

Senior member
Feb 19, 2017
387
754
136
That score even at 4GHz looks power or cooling limited. We would need stock score to even start debating if lower than expected result in CB15 is down to memory subsystem configuration or simply hitting power limits.
Regardless, these 32C TR2's will be one of the fastest workstation CPU's we can buy this year. After years of CPU stagnation, we are finally reliving CPU wars which benefit us, consumers!
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
That score even at 4GHz looks power or cooling limited. We would need stock score to even start debating if lower than expected result in CB15 is down to memory subsystem configuration or simply hitting power limits.
Regardless, these 32C TR2's will be one of the fastest workstation CPU's we can buy this year. After years of CPU stagnation, we are finally reliving CPU wars which benefit us, consumers!

It is overclocked so there are no power limits present (OC-Mode).
Also, in OC-Mode the CPU won't drop its frequency unless the temperatures reach 115°C.
The AIO is obviously dwarfed by the power dissipation of the CPU, but it is unlikely it to hit the throttling threshold regardless of that. CB15 nT doesn't take too long to complete either on a CPU like that.

I wouldn't expect to see much of a penalty from the DRAM configuration in CB15 nT, as long as the memory frequency is sufficiently high (>= 2933MHz). If there is a major penalty in Cinebench then the penalty in other workloads which are actually latency sensitive will be abysmal.
 
  • Like
Reactions: Drazick

TheGiant

Senior member
Jun 12, 2017
748
353
106
Do you think the 4 memory channels are enough to satisfy 32cores for CFD calculations? I need ECC memory so 2666 MHz is max frequency and 512GB capacity.