[pcgamesn] AMD is giving Threadripper 2 moar cores and a top TDP of 250W

Page 2 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
There has to be 1 DPC 8CH TR4 boards incoming for these CPUs, as otherwise they'll be the most futile consumer HEDT CPUs ever released.
A 4P CPU with half of the cores having no local DRAM available, meaning the memory accesses will be routed through GMI interface (> 120ns on top of the normal latency) to the cores with the memory present.
Unless that IF got tricks up its sleeve, this thing sounds awful.
 

mattiasnyc

Senior member
Mar 30, 2017
356
337
136
If you're buying 32 cores and they're not branded Epyc, you probably don't care much about the memory latency - you care about core throughput. Which a 250W TDP platform with 4x binned 2700X dies would bring in spades. No one buys a 32 core CPU for gaming.

Probably depends on the workload. If Epyc is server and Ryzen is prosumer and down then TR is probably workstation CPUs. As such I think it'll depend. For those of us in media creation it's going to be highly dependent on what we do. Mixing audio is fine. Running virtual instruments playing them back with low latency? Probably not. Visual media creation? Probably fine for render computers, might be problematic for some other uses...
 

VirtualLarry

No Lifer
Aug 25, 2001
56,325
10,034
126
Kind of makes me wonder how well XMR mining goes on a TR2 CPU, with the added latency for two of the cores. Then again, it primarily exercises the L3 cache, so with two more active dies, I assume that it will have twice the total L3 cache of the original TR CPUs. Which, it was mentioned, could pay for themselves within a year mining XMR in the background. Let's see if TR2 can pay for itself.

I'm pretty excited. I already (just) bought the 2700X and an Asus X470 Prime Pro ATX board (Newegg combo, $55 off), and a nice DIYPC Vanguard tempered-glass case with 4x RGB fans and a fan / lighting controller built-in. QC and longevity may not be the best, but it has a LOT of nice features in a case. This may be DIYPC's best case yet. It was on sale for $89 with a $10 MIR, which I probably won't bother to send in.

But even though this is my first 8C/16T CPU, I'm salivating at the idea of a 32C/64T machine. Crazy good!

Oh yeah, the 2700X is finally going to get 32GB of RAM. My R5 1600 rigs with dual RX 470/570 GPUs, mining on all three (2x GPU and 1x CPU), take up enough virtual memory, that opening maybe 100-200 tabs in Firefox Nightly pushes me over the limit of 16GB of physical RAM.
 

moonbogg

Lifer
Jan 8, 2011
10,635
3,095
136
This is so amazing! Sick! 32 cores on the desktop! We were stuck with 8 just a few years ago for a grand! Thanks AMD!
Also, on a side note, its been scientifically proven (by someone somewhere probably) that Intel's SUDDEN lightning release of a 28 core unicorn CPU is not, I repeat, is not a reaction to AMD in any way, shape or form. Intel just does Intel. That's it baby.
 

Gideon

Golden Member
Nov 27, 2007
1,619
3,645
136
Hah, Called It!

Just looking at how many AMD execs mentioned Threadripper 2 as the most exciting product (for them) in 2018. Intel bolting on their 28 core presentation ... there had to be something coming from AMD that did more, than just 10% more clock speeds.

The latency will be awful for those cores, for sure, yet if the rumors are true of 7nm Rome having a 5th (14nm?) die for memory controller and all the peripherals, Threadripper 2 would be a good test product for a chip with no direct connection to memory ...
 

LTC8K6

Lifer
Mar 10, 2004
28,520
1,575
126
This is so amazing! Sick! 32 cores on the desktop! We were stuck with 8 just a few years ago for a grand! Thanks AMD!
Also, on a side note, its been scientifically proven (by someone somewhere probably) that Intel's SUDDEN lightning release of a 28 core unicorn CPU is not, I repeat, is not a reaction to AMD in any way, shape or form. Intel just does Intel. That's it baby.
How does Intel get info on AMD releases ahead of time, though? :)
 

Insert_Nickname

Diamond Member
May 6, 2012
4,971
1,691
136
Challenge Accepted.

If I was looking at plonking down that amount on such a system, I'm pretty sure it would say Epyc on the label. I wouldn't want compromises when spending that kind of money.

But yes, 32C/64T for gaming FTW... :D

But even though this is my first 8C/16T CPU, I'm salivating at the idea of a 32C/64T machine. Crazy good!

The sheer idea of a 32C consumer class system already has me somewhat baffled. These are good times indeed.

I might splunge a bit and get a 2900X system. Don't really need more then 8 cores, but I really could put those 60 PCIe lanes to good use. High frequency and improved turbo boost is just icing on the cake.
 
  • Like
Reactions: DarthKyrie

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
Btw with TR up to 32 core, then there will be 2 core arrangements, one with 2 active die and another with 4 die activated.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136

Once in a blue moon WCCFTech gets something right and PCGamesN Parrots it, the other 99% of the time it's fake and things just get edited later. (Unless something has changed)

Regardless, the clocks look abysmal, and it looks like the new PB2 features, etc. will NOT be coming to the new chips? Otherwise boost should be as high as 4.3GHz (single core). I don't believe latency issues will be that bad. Existing tools in Ryzen master for the 1950X/1920X/1900X will probably be modified to work around this.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
If you're buying 32 cores and they're not branded Epyc, you probably don't care much about the memory latency - you care about core throughput. Which a 250W TDP platform with 4x binned 2700X dies would bring in spades. No one buys a 32 core CPU for gaming.

Bwahaha, they said that about the 1950X, yet, it overclocks better, and the additional threads and quad channel actually help some games run faster. However, the 32 core chip...being 250w tdp...yeah...no. Hopefully they have a 16 core version that clocks higher. Otherwise they just saved me a ton of money.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
Bwahaha, they said that about the 1950X, yet, it overclocks better, and the additional threads and quad channel actually help some games run faster. However, the 32 core chip...being 250w tdp...yeah...no. Hopefully they have a 16 core version that clocks higher. Otherwise they just saved me a ton of money.


Yeah, especially when epyc 2 will launch in 2019 with 64 core and pcie4 and 7nm and high performance new optimization infinity fabric. God damnit I'm so over hyped and overwhelmed. Bring it on 2019, gonna be epyc year
 
  • Like
Reactions: IEC

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
I don't believe latency issues will be that bad. Existing tools in Ryzen master for the 1950X/1920X/1900X will probably be modified to work around this.

How do you alter the fact that half of the dies need to use GMI for memory accesses 100% of the time, using software? Otherwise than shutting down the "leech" dies completely, which obviously isn't very productive thing to do.
With 1st gen. TR, while using the distributed-mode some of the memory accesses are crossing the GMI interface when the data isn't available in the local (die specific) memory.
This hurts the latency badly, eventhou not all of the accesses go through the GMI (which won't be the case with half of the dies / cores on gen. 2 TR).

2_-_tr_cache_3200.png
 
  • Like
Reactions: Drazick and CatMerc

CatMerc

Golden Member
Jul 16, 2016
1,114
1,149
136
How do you alter the fact that half of the dies need to use GMI for memory accesses 100% of the time, using software? Otherwise than shutting down the "leech" dies completely, which obviously isn't very productive thing to do.
With 1st gen. TR, while using the distributed-mode some of the memory accesses are crossing the GMI interface when the data isn't available in the local (die specific) memory.
This hurts the latency badly, eventhou not all of the accesses go through the GMI (which won't be the case with half of the dies / cores on gen. 2 TR).

2_-_tr_cache_3200.png
That creator mode numbers are an average of local and remote memory access right? Are there latency numbers for just remote access?
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
How do you alter the fact that half of the dies need to use GMI for memory accesses 100% of the time, using software? Otherwise than shutting down the "leech" dies completely, which obviously isn't very productive thing to do.
With 1st gen. TR, while using the distributed-mode some of the memory accesses are crossing the GMI interface when the data isn't available in the local (die specific) memory.
This hurts the latency badly, eventhou not all of the accesses go through the GMI (which won't be the case with half of the dies / cores on gen. 2 TR).

2_-_tr_cache_3200.png

Strict "Game Mode" IMO was always useless on Threadripper. Simply setting the memory mode to 'local' or 'channel' mode gained pretty much all of the performance benefits of 'game mode' without disabling potential 'extra' cores. I've also found that some games utilize those extra cores, and I expect more usage down the road. AMD is on the right track here, while Intel is clearly spinning the wheels and trying to rig a demo.

Note that the reason Cinebench wasn't shown off today is that final clocks have yet to be determined. If AMD can get single threaded performance up to above 2700X levels, overclockers could probably (again) get a stable 4.0 GHz all core overclock (albeit with a beefier cooling system, but nothing like Intels). AMD SHOULD show off a decent overclocked cinebench result. The extra cores on the new chip would push the score past what Intel scored if they could get the clocks high enough. (4.2 GHz or so should do it).
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
If AMD can get single threaded performance up to above 2700X levels

I'd expect the maximum single core to be either the same or ~50MHz higher than on 2700X (4.35 - 4.4GHz).
Golden samples or not, they will be running into voltage wall anyway (1.550V is the highest available per SVI2 spec).
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
How do you alter the fact that half of the dies need to use GMI for memory accesses 100% of the time, using software? Otherwise than shutting down the "leech" dies completely, which obviously isn't very productive thing to do.
With 1st gen. TR, while using the distributed-mode some of the memory accesses are crossing the GMI interface when the data isn't available in the local (die specific) memory.
This hurts the latency badly, eventhou not all of the accesses go through the GMI (which won't be the case with half of the dies / cores on gen. 2 TR).

2_-_tr_cache_3200.png
Correct. What immediately stood out to me after my excitement in finding out this could be placed in the current socket was how in the world the dies were being fed. I definitely was disappointed when I realized the obvious as confirmed here :
TR_Layout_Unofficial_AT_575px.png

All is not lost but an incredible amount is going to be as this will no doubt impact performance and the two new dies are going to be underutilized given the cut feeder input they have being restricted to Infinity fabric bandwidth. Then comes complications like what in the heck does the pcie root complex look like in this kind of config? You have to cross the infinity fabric for all I/O... This will definitely bring up some unique cases for full utilization of the two new die and I can't say that I have any sensible use cases. This downside adds on to the fact that you're now facing lower clocks to hit the power envelope. A mild OC on 1950x already hits 250Watts. This is the TDP stock and its spread out across even more cores.

I currently run a distributed configuration that i'd like to consolidate but this configuration really has me scratching my head. I feel like it'd be a nightmare to work with and actually might decrease my performance.
 
  • Like
Reactions: Drazick

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
What clock speeds have you seen and where?

As of right now, the 32 core Threadripper is clocked at 3.0 base and 3.4 boost. It looks like the 32 core version is finalized, but the 24 core version is not. This doesn't make sense given the improved PB2 + XFR changes. The boost should be up in the 4.2-4.3 ghz range. AMD really needs to work on their boosting algorithm. They would be doing a service to us all if they used a combination of a microcode update and a bios update to an enhanced PB + XFR offering. I've long suspected that the 'clock table' of the original Ryzen series was stored either in microcode or in the BIOS itself. The correct approach to this situation would be to monitor TDP via the active sensors and send clock up to the max boost clock as needed based on core usage and TDP. Doing so would enhance performance for older parts. This would require some work, both by AMD and by board manufacturers, but would create a ton of good will for AMD customers. It would also hurt Intel similarly, as in the past Intel has known to pull anti-consumer actions (such as attempting to charge for extra features, etc.).

Regardless, if they can get the max boost up to 2700X levels, I would consider buying the 32 core chip. As it stands, my overclocked 16 core Threadripper performs significantly better in games than the 1800X did, and pretty much ties up with a non overclocked 2700X, in some cases exceeding performance in places where cores/threads matter.

Finally, I've written a voxel tech demo that I may turn into a benchmark. It generates a world of X size and utilizes all cores. It also utilizes all cores when discarding/rendering chunks as the user moves around the world. It would be interesting to release this as a benchmark to see how well the future of gaming really compares when it comes to multi-core.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,025
136
Oh, one more thought here: They should have made 8 channel motherboards, while utilizing quad channel mode for older boards. That would provide an incentive to upgrade the board along with the chip. At any rate, anyone want to buy my 1950X, ASUS ROG Strix X399-E, 32 GB RAM? I can also throw in a broken MSI X399 carbon for good measure. :)
 

ub4ty

Senior member
Jun 21, 2017
749
898
96
The only way to guarantee "remote" accesses on gen. 1 TR would be removing all DIMMs from the other die. I don't know if anyone has tested that.
Anyway, the die to die latency itself is pretty brutal (180-256ns):

https://www.tomshardware.co.uk/amd-ryzen-threadripper-1950x-cpu,review-33976-2.html
Yeah, those latencies numbers are akin to the latency of 2socket communication latencies over QPI of older Xeons. It is indeed brutal.
notes-on-numa-architecture-4-638.jpg

low-latency-mechanical-sympathy-issues-and-solutions-41-638.jpg

Latencies%20comparison.png

I'd still love to see some benches and unique core assigned workloads but this is already taxing my brain in terms of the now 3-4 ranges of latencies that now exists. They really going to need to demo and explain how to run this configuration properly.
 
  • Like
Reactions: CatMerc

naukkis

Senior member
Jun 5, 2002
705
576
136
Is AMD confirmed that memory arrangement? As they have 4 dies and 4-channel memory it is also easy to impelement 1-channel per die memory configuration which makes every die equal to each other instead of odd configuration of two dies with local memory only and two with far memory only.