It is official. AMD announced and demonstrated Heavy Metal 32c/64t Threadripper 2. 7nm on the way.

Page 5 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
17,775
1,380
136

IEC

Super Moderator
Super Moderator
Jun 10, 2004
13,565
390
136
1402 seems low for a 8700k, even at base?
That's in the normal range. I get around there at stock. Overclocked, I get 1557 @ 4.7GHz core + cache, no AVX offset.
 

LightningZ71

Senior member
Mar 10, 2017
245
7
86
But its not normal ram speed. Half of the CCX's leach memory, not direct access. Did you read that ?
OK, so they leech memory. If their working sets fit in their L3 cache in their CCX, and there is sufficient total DRAM bandwidth to supply all the cores with steady cache fills, the added memory access latency won't make a huge difference.
 

maddie

Platinum Member
Jul 18, 2010
2,587
490
136
But its not normal ram speed. Half of the CCX's leach memory, not direct access. Did you read that ?
AFAIK, that is still an unknown. Do you know for certain that this is true?
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
17,775
1,380
136
AFAIK, that is still an unknown. Do you know for certain that this is true?
From what I read (not sure I remember the sources, check every reply in this thread, its there).
There is almost no other way without going 8 channel memory, which X399 can not support. Its an EPYC chip, slightly modified to work in TR4 instead of SP3 socket.
 

Abwx

Diamond Member
Apr 2, 2011
8,870
213
126
But its not normal ram speed. Half of the CCX's leach memory, not direct access. Did you read that ?
The data to be computed fits in the cache, the computed data set is not big as well, there s huge amount of computing that is done around a small set of datas, that s a rendering, not an X264 encoding where it s the data set that is huge..

Beside you can check some tests here and there, CB is not sensitive to RAM speed above fairly low rates, that s an advantage of this FP bench.



https://www.youtube.com/watch?v=S5LfrEK_szo
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
17,775
1,380
136
The data to be computed fits in the cache, the computed data set is not big as well, there s huge amount of computing that is done around a small set of datas, that s a rendering, not an X264 encoding where it s the data set that is huge..

Beside you can check some tests here and there, CB is not sensitive to RAM speed above fairly low rates, that s an advantage of this FP bench.



https://www.youtube.com/watch?v=S5LfrEK_szo
You are not listening to anything I have said, so just wait until official benchmarks come out. Maybe they can explain it so you can understand
 

maddie

Platinum Member
Jul 18, 2010
2,587
490
136
From what I read (not sure I remember the sources, check every reply in this thread, its there).
There is almost no other way without going 8 channel memory, which X399 can not support. Its an EPYC chip, slightly modified to work in TR4 instead of SP3 socket.
This is from a post in the thread. A few of us were discussing the chance of a 2 core leech OR 1 active memory controller / die instead of the normal 2. The Stilt did some quick memory test and got the results below, suggesting that rerouted traces in the package would be a superior solution. The whole of Pg4 of that thread has a lot of posts on the topic. Whether it happens or not, is another question.

https://forums.anandtech.com/thread...-top-tdp-of-250w.2547899/page-4#post-39451806

Post:
"Based on the quick synthetic tests the best option would be having 1CH per die, which of course would require active IMCs on each of the dies and totally reworked SP3r2 package (memory pad arrangement wise, which is very unlikely to happen).

1950X at fixed 3.4GHz frequency, 2933MHz MEMCLK CL14-14-14-1T.

3RA

2CPD (NUMA) = 85961MB/s (Read), 86643MB/s (Write), 81097MB/s (Copy), 78.33ns
1CPD (NUMA) = 44458MB/s (Read), 43449MB/s (Write), 40789MB/s (Copy), 78.80ns
2+0 CPD (LEECH) = 34495MB/s (Read), 37059MB/s (Write), 34823MB/s (Copy), 127.00ns"
 

wahdangun

Senior member
Feb 3, 2011
993
2
106
This is from a post in the thread. A few of us were discussing the chance of a 2 core leech OR 1 active memory controller / die instead of the normal 2. The Stilt did some quick memory test and got the results below, suggesting that rerouted traces in the package would be a superior solution. The whole of Pg4 of that thread has a lot of posts on the topic. Whether it happens or not, is another question.

https://forums.anandtech.com/thread...-top-tdp-of-250w.2547899/page-4#post-39451806

Post:
"Based on the quick synthetic tests the best option would be having 1CH per die, which of course would require active IMCs on each of the dies and totally reworked SP3r2 package (memory pad arrangement wise, which is very unlikely to happen).

1950X at fixed 3.4GHz frequency, 2933MHz MEMCLK CL14-14-14-1T.

3RA

2CPD (NUMA) = 85961MB/s (Read), 86643MB/s (Write), 81097MB/s (Copy), 78.33ns
1CPD (NUMA) = 44458MB/s (Read), 43449MB/s (Write), 40789MB/s (Copy), 78.80ns
2+0 CPD (LEECH) = 34495MB/s (Read), 37059MB/s (Write), 34823MB/s (Copy), 127.00ns"
but it will make less than 32 thread workload suffer, and usually multi thread higher than 32 thread is not latency sensitive.

so in this case amd decision is right, when the workload is less than 2 dies, it will get highest performance, and maybe AMD will put highest bin dies in TR2 with active IMC so it will have maximum performance in less threaded application.
 

Abwx

Diamond Member
Apr 2, 2011
8,870
213
126
You are not listening to anything I have said, so just wait until official benchmarks come out. Maybe they can explain it so you can understand
It shouldnt be long before there s eventual better leaks, anyway i predict 7200-7300pts at 4.2GHz....
 

The Stilt

Golden Member
Dec 5, 2015
1,709
85
106
I don't see how you can dispute that. The 2990x is a different animal. 16 of the cores do not have direct memory access. It is 4 2700x's, but the memory is gimped. So 4 2700x's are 7312, and this was 6300. So looks very doable to me.
2700X averages ~ 120W (Core + SoC power, 2933MHz MEMCLK) in CB R15 nT meaning the 2990X would need around 460W+ power limit to have the same per die power budget as the 2700X.
That already takes the lower SoC / PCI-E Phy power draw on two of the dies (MC) into account.

I expect the 250W TDP SKUs are configured in the same way as the AM4 Pinnacle Ridge parts are, where the actual power limit is >= 35% higher than the TDP (i.e. 337.5W).
Even still that will not allow even remotely the same per die power budget as on 2700X, or not even the 65W parts.

1402 seems low for a 8700k, even at base?
1400'ish is correct for 8700K, I've got 1436 with all of the parameters configured to meet Intel specs.
 
Feb 19, 2017
88
33
76
That score even at 4GHz looks power or cooling limited. We would need stock score to even start debating if lower than expected result in CB15 is down to memory subsystem configuration or simply hitting power limits.
Regardless, these 32C TR2's will be one of the fastest workstation CPU's we can buy this year. After years of CPU stagnation, we are finally reliving CPU wars which benefit us, consumers!
 

The Stilt

Golden Member
Dec 5, 2015
1,709
85
106
That score even at 4GHz looks power or cooling limited. We would need stock score to even start debating if lower than expected result in CB15 is down to memory subsystem configuration or simply hitting power limits.
Regardless, these 32C TR2's will be one of the fastest workstation CPU's we can buy this year. After years of CPU stagnation, we are finally reliving CPU wars which benefit us, consumers!
It is overclocked so there are no power limits present (OC-Mode).
Also, in OC-Mode the CPU won't drop its frequency unless the temperatures reach 115°C.
The AIO is obviously dwarfed by the power dissipation of the CPU, but it is unlikely it to hit the throttling threshold regardless of that. CB15 nT doesn't take too long to complete either on a CPU like that.

I wouldn't expect to see much of a penalty from the DRAM configuration in CB15 nT, as long as the memory frequency is sufficiently high (>= 2933MHz). If there is a major penalty in Cinebench then the penalty in other workloads which are actually latency sensitive will be abysmal.
 

TheGiant

Senior member
Jun 12, 2017
410
68
86
Do you think the 4 memory channels are enough to satisfy 32cores for CFD calculations? I need ECC memory so 2666 MHz is max frequency and 512GB capacity.
 


ASK THE COMMUNITY

TRENDING THREADS