[pcgamesn] AMD is giving Threadripper 2 moar cores and a top TDP of 250W

ub4ty · Jun 6, 2018

eek2121 said:
As of right now, the 32 core Threadripper is clocked at 3.0 base and 3.4 boost. It looks like the 32 core version is finalized, but the 24 core version is not. This doesn't make sense given the improved PB2 + XFR changes. The boost should be up in the 4.2-4.3 ghz range. AMD really needs to work on their boosting algorithm. They would be doing a service to us all if they used a combination of a microcode update and a bios update to an enhanced PB + XFR offering. I've long suspected that the 'clock table' of the original Ryzen series was stored either in microcode or in the BIOS itself. The correct approach to this situation would be to monitor TDP via the active sensors and send clock up to the max boost clock as needed based on core usage and TDP. Doing so would enhance performance for older parts. This would require some work, both by AMD and by board manufacturers, but would create a ton of good will for AMD customers. It would also hurt Intel similarly, as in the past Intel has known to pull anti-consumer actions (such as attempting to charge for extra features, etc.).

Regardless, if they can get the max boost up to 2700X levels, I would consider buying the 32 core chip. As it stands, my overclocked 16 core Threadripper performs significantly better in games than the 1800X did, and pretty much ties up with a non overclocked 2700X, in some cases exceeding performance in places where cores/threads matter.

Finally, I've written a voxel tech demo that I may turn into a benchmark. It generates a world of X size and utilizes all cores. It also utilizes all cores when discarding/rendering chunks as the user moves around the world. It would be interesting to release this as a benchmark to see how well the future of gaming really compares when it comes to multi-core.

I'd love to get my hands on such a demo. Can you give details on it? What language/dependencies? Are you likely to share the source? As for the 32 core clock and boost.. I think you need to remind yourself that this is 32 cores... My current 1950x runs at 250W TDP when maxed out with just a mild OC above stock speeds. You're spreading this (even w/ power reductions of 12nm) across double the amount of cores. It's a slightly cut down 32 core epyc :
https://www.newegg.com/Product/Product.aspx?Item=N82E16819113471
which runs at 2.7/3.2
You're already seeing the (12nm) at work given that you'r getting 3.0/3.4.. maybe slap on another 300-500Mhz max for an intel housefire like TDP (which is probably why new beefier VRM threadripper boards have arrived). I'm not sure what kind of arrangement people have but OC'ing dumps an incredible amount of unnecessary heat from one's setup. If you're actually utilizing all of the I/O and have those PCIE slots slammed with cards, you're talking about a space heater when this thing gets going.

ub4ty · Jun 6, 2018

naukkis said:
Is AMD confirmed that memory arrangement? As they have 4 dies and 4-channel memory it is also easy to impelement 1-channel per die memory configuration which makes every die equal to each other instead of odd configuration of two dies with local memory only and two with far memory only.

I'd hope they would have detailed this at launch but my thinking is there is no way in hell two feed the other dies in this manner because I'm not sure you can have dynamic wiring to the dimm/pcie and I/O. This traces are fixed in the mobo to set pins. So, there's frankly no way to physically reach the two new dies besides power from the mobo. Someone correct me if i'm wrong

mattiasnyc · Jun 6, 2018

eek2121 said:
As of right now, the 32 core Threadripper is clocked at 3.0 base and 3.4 boost.

They said that?

naukkis · Jun 6, 2018

ub4ty said:
I'd hope they would have detailed this at launch but my thinking is there is no way in hell two feed the other dies in this manner because I'm not sure you can have dynamic wiring to the dimm/pcie and I/O. This traces are fixed in the mobo to set pins. So, there's frankly no way to physically reach the two new dies besides power from the mobo. Someone correct me if i'm wrong

They don't need dynamic wiring, they only need to route one memory channel from four dies in cpu substrate to memory channels instead of two channel from two dies. It's not a problem at all so that configuration with two dies without local memory is just so stupid solution that I think it's only from some journalist pen instead of AMD, or at least I hope so.

ub4ty · Jun 6, 2018

eek2121 said:
Oh, one more thought here: They should have made 8 channel motherboards, while utilizing quad channel mode for older boards. That would provide an incentive to upgrade the board along with the chip. At any rate, anyone want to buy my 1950X, ASUS ROG Strix X399-E, 32 GB RAM? I can also throw in a broken MSI X399 carbon for good measure.

Something tells me just like with 1950x that the dies are sourced from EPYC dies that have defects in the I/O regions .. For 2 die config it's in the infinity fabric region. For this 4 die config its in the
- PCIE
- DDR4 mem controller
region. So, it would be physically impossible.

Otherwise, for dies w/ no defects, what you describe is just an EPYC chip. So, I don't think they would or could have done this. How did you break a mobo btw? *asking for a friend

ub4ty · Jun 6, 2018

naukkis said:
They don't need dynamic wiring, they only need to route one memory channel from four dies in cpu substrate to memory channels instead of two channel from two dies. It's not a problem at all so that configuration with two dies without local memory is just so stupid solution that I think it's only from some journalist pen instead of AMD, or at least I hope so.

*Route? There are physical board traces from the dimm slots to the pinouts on the CPU. When you say *routing* as in re-routing the data once its in a cpu complex that's essentially what is described here :

The wire traces to the CPU pin outs are set. They aren't dynamic. Once the data is in the CPU complex you can re-route it sure but that's the problem were discussing and the latency therein. The above diagram is most likely 100% accurate save for the logistics on how the data routing is done. Suffice to say this is definitely a head scratcher and needs a range of use cases to proof its viability.

formulav8 · Jun 6, 2018

New cooler (Cooler Master I think) designed for Threadripper.

ub4ty · Jun 6, 2018

formulav8 said:
New cooler (Cooler Master I think) designed for Threadripper.

I'll stick w/ my Noctua. Although, I wish they had better mating of the heatpipes w/ the contact plate. Is a single fan wedged in between there it seems?

naukkis · Jun 6, 2018

ub4ty said:
*Route? There are physical board traces from the dimm slots to the pinouts on the CPU. When you say *routing* as in re-routing the data once its in a cpu complex that's essentially what is described here :

The wire traces to the CPU pin outs are set. They aren't dynamic. Once the data is in the CPU complex you can re-route it sure but that's the problem were discussing and the latency therein. The above diagram is most likely 100% accurate save for the logistics on how the data routing is done. Suffice to say this is definitely a head scratcher and needs a range of use cases to proof its viability.

Between CPU dies and motherboard is substrate which routes signals from dies to pins. There's no problem at all reconfigure those as wanted.

ub4ty · Jun 6, 2018

naukkis said:
Between CPU dies and motherboard is substrate which routes signals from dies to pins. There's no problem at all reconfigure those as wanted.

Interesting. Thank you for this bit of information. I was unaware of that. May you link me if possible?
I think more and more its clear to me for wait until official details come out.

The Stilt · Jun 6, 2018

naukkis said:
Between CPU dies and motherboard is substrate which routes signals from dies to pins. There's no problem at all reconfigure those as wanted.

The CPU substrate isn't the problem, the topology of the existing motherboards is.
The existing boards are 2 DPC, A0 & A1, B0 & B1, C0 & C1 and D0 & D1 are sharing the signals.

naukkis · Jun 6, 2018

ub4ty said:
Interesting. Thank you for this bit of information. I was unaware of that. May you link me if possible?
I think more and more its clear to me for wait until official details come out.

It's just how flip-chips CPU's are made:

http://electronicpackaging.asmedigitalcollection.asme.org/article.aspx?articleid=2532707

That substrate dictates how die connects to cpu pins, it's normal to use one die to many different sockets. Intel did that recently with Kabylake, they used that same die with 2-channel socket 1150 substrate and with different substrate that same die could be used with 4-channel memory motherboard(LGA2066), of course only with 2-memory channels as die didn't have more. Same way AMD could wire 1-channel from 4 dies to memory channels with 4-die Threadrippers, and if they don't do it that way they are...... ?

naukkis · Jun 6, 2018

The Stilt said:
The CPU substrate isn't the problem, the topology of the existing motherboards is.
The existing boards are 2 DPC, A0 & A1, B0 & B1, C0 & C1 and D0 & D1 are sharing the signals.

Every memory channel has two memory sockets, what's the problem?

TassadarL · Jun 6, 2018

The Stilt said:
The only way to guarantee "remote" accesses on gen. 1 TR would be removing all DIMMs from the other die. I don't know if anyone has tested that.
Anyway, the die to die latency itself is pretty brutal (180-256ns):

https://www.tomshardware.co.uk/amd-ryzen-threadripper-1950x-cpu,review-33976-2.html

What I tested is I used dual channel 2933MHz RAM on Die#0 and no RAM on Die#1. Using compression test that Die#1 only perform 70% what Die#0 performed.

CatMerc · Jun 6, 2018

The Stilt said:
The only way to guarantee "remote" accesses on gen. 1 TR would be removing all DIMMs from the other die. I don't know if anyone has tested that.
Anyway, the die to die latency itself is pretty brutal (180-256ns):

https://www.tomshardware.co.uk/amd-ryzen-threadripper-1950x-cpu,review-33976-2.html

Yeah, but that's core to core ping latency. Conceptually the latency could be lower for just memory accesses. Depends on how the IF is built.

ub4ty · Jun 6, 2018

TassadarL said:
What I tested is I used dual channel 2933MHz RAM on Die#0 and no RAM on Die#1. Using compression test that Die#1 only perform 70% what Die#0 performed.

Very interesting and not bad. My question would be what happens when both dies try to suck through the same straw vs. one die at a time. Both dies drawing down on the dimm is more realistic (if the config is as described by anandtech) because if I max out all 32 cores you essentially have two dies now fighting for the same memory whereas in a proper epyc config with all dies individually fed, everyone has their own straw and pool to suck from.

I'm really enjoying the discussion on this and I'm learning alot. However, ultimately AMD should go in great detail on this issue as well as benchmarks and demonstrated use cases for how to most efficiently utilize this weird combo. I guess this gets resolved a bit if it is as Naukkis says and the dies get their own half straw?

CatMerc · Jun 6, 2018

eek2121 said:
As of right now, the 32 core Threadripper is clocked at 3.0 base and 3.4 boost. It looks like the 32 core version is finalized, but the 24 core version is not. This doesn't make sense given the improved PB2 + XFR changes. The boost should be up in the 4.2-4.3 ghz range. AMD really needs to work on their boosting algorithm. They would be doing a service to us all if they used a combination of a microcode update and a bios update to an enhanced PB + XFR offering. I've long suspected that the 'clock table' of the original Ryzen series was stored either in microcode or in the BIOS itself. The correct approach to this situation would be to monitor TDP via the active sensors and send clock up to the max boost clock as needed based on core usage and TDP. Doing so would enhance performance for older parts. This would require some work, both by AMD and by board manufacturers, but would create a ton of good will for AMD customers. It would also hurt Intel similarly, as in the past Intel has known to pull anti-consumer actions (such as attempting to charge for extra features, etc.).

Regardless, if they can get the max boost up to 2700X levels, I would consider buying the 32 core chip. As it stands, my overclocked 16 core Threadripper performs significantly better in games than the 1800X did, and pretty much ties up with a non overclocked 2700X, in some cases exceeding performance in places where cores/threads matter.

Finally, I've written a voxel tech demo that I may turn into a benchmark. It generates a world of X size and utilizes all cores. It also utilizes all cores when discarding/rendering chunks as the user moves around the world. It would be interesting to release this as a benchmark to see how well the future of gaming really compares when it comes to multi-core.

Neither are finalized. Both turbo numbers were listed as WIP. And were listed as ALL CORE turbo btw.

TassadarL · Jun 6, 2018

ub4ty said:
Very interesting and not bad. My question would be what happens when both dies try to suck through the same straw vs. one die at a time. Both dies drawing down on the dimm is more realistic (if the config is as described by anandtech) because if I max out all 32 cores you essentially have two dies now fighting for the same memory whereas in a proper epyc config with all dies individually fed, everyone has their own straw and pool to suck from.

I'm really enjoying the discussion on this and I'm learning alot. However, ultimately AMD should go in great detail on this issue as well as benchmarks and demonstrated use cases for how to most efficiently utilize this weird combo. I guess this gets resolved a bit if it is as Naukkis says and the dies get their own half straw?

This performance is too bad to function normally. If you use the same method on Games, use Die#1 to play game you can see a significant jumper frame time.I will test " both dies try to suck through the same straw" today with this situation. You can see EPYC as one single CPU as one DIE, 4 dies as 4 "cores", infinity fabric bus as "L3" which is sync to RAM speed. Then you can easily understand if TR2 with 4 "Cores" but only 2 memory controller looks like

TassadarL · Jun 6, 2018

ub4ty said:
Very interesting and not bad. My question would be what happens when both dies try to suck through the same straw vs. one die at a time. Both dies drawing down on the dimm is more realistic (if the config is as described by anandtech) because if I max out all 32 cores you essentially have two dies now fighting for the same memory whereas in a proper epyc config with all dies individually fed, everyone has their own straw and pool to suck from.

I'm really enjoying the discussion on this and I'm learning alot. However, ultimately AMD should go in great detail on this issue as well as benchmarks and demonstrated use cases for how to most efficiently utilize this weird combo. I guess this gets resolved a bit if it is as Naukkis says and the dies get their own half straw?

I used Single channel per die as system to test game, and set affinity on both dies like DIE 0 use 8 threads and DIE 1 use 8 threads together, the FPS was much worse than memory set 2+0

TassadarL · Jun 6, 2018

ub4ty said:
Very interesting and not bad. My question would be what happens when both dies try to suck through the same straw vs. one die at a time. Both dies drawing down on the dimm is more realistic (if the config is as described by anandtech) because if I max out all 32 cores you essentially have two dies now fighting for the same memory whereas in a proper epyc config with all dies individually fed, everyone has their own straw and pool to suck from.

I'm really enjoying the discussion on this and I'm learning alot. However, ultimately AMD should go in great detail on this issue as well as benchmarks and demonstrated use cases for how to most efficiently utilize this weird combo. I guess this gets resolved a bit if it is as Naukkis says and the dies get their own half straw?

Well the only one I desire or I concern is whether AMD give an 8-Channel desktop motherboard as X399-8ch or X499

The Stilt · Jun 6, 2018

naukkis said:
Every memory channel has two memory sockets, what's the problem?

The fact that the signals are shared between the channels, outside the CAD and clock pins.
To support the individual channels each of the slots need to have their own signals routed on board.

The CPU (package) could do it, the socket obviously can do it but the existing motherboards cannot.

naukkis · Jun 6, 2018

The Stilt said:
The fact that the signals are shared between the channels, outside the CAD and clock pins.
To support the individual channels each of the slots need to have their own signals routed on board.

The CPU (package) could do it, the socket obviously can do it but the existing motherboards cannot.

What signals are shared between channels? Even when two channels come from one die they both can work independently to each other so not much can be shared between them.

Do you mixed this to 8-channel system? Obviously x399 boards only have 4-channels but is there something that prevents using one channel per die(two memory slots for every channel as 2 dimms per channel) so every die could have local memory, and system could still be real SMT NUMA which cpu with no local attached memory could not be. If AMD's solution really is two dies without local memory that configuration would be unique to anything out there and should need very own complex performance optimizations to both OS and programs itself.

StinkyPinky · Jun 6, 2018

mattiasnyc said:
They said that?

It is on Anandtech's website. Not sure if that is official or just speculation. They are samples also so it may be boosted slightly.

I'm also hoping for a new 16 core version with higher clocks and perhaps two dummy dies again.

eek2121 · Jun 7, 2018

StinkyPinky said:
It is on Anandtech's website. Not sure if that is official or just speculation. They are samples also so it may be boosted slightly.

I'm also hoping for a new 16 core version with higher clocks and perhaps two dummy dies again.

It amazes me how many people here browse the forums, but not the site. I am also hoping for a 16 core version, though if they can get the clocks up (if AT is wrong) then I'd consider a 24 or 32 core version.

The Stilt · Jun 7, 2018

naukkis said:
What signals are shared between channels? Even when two channels come from one die they both can work independently to each other so not much can be shared between them.

Do you mixed this to 8-channel system? Obviously x399 boards only have 4-channels but is there something that prevents using one channel per die(two memory slots for every channel as 2 dimms per channel) so every die could have local memory, and system could still be real SMT NUMA which cpu with no local attached memory could not be. If AMD's solution really is two dies without local memory that configuration would be unique to anything out there and should need very own complex performance optimizations to both OS and programs itself.

Slots of the same memory channels (e.g. A0 & A1) share all but 10 signals (CAD).
You can look at any 2 DPC board schematic you can find, to check that.

Slots for the different channels (e.g. A0 & B0) share none, except the SMBUS ones for the SPD device.

All of the existing X399 boards with eight slots on them are 2 DPC (A0 & A1, B0 & B1, C0 & C1, D0 & D1), while an EPYC or 8CH TR would need to have 1 DPC (or 2 DPC with 16 slots) with A0, B0, C0, D0, E0, F0, G0, H0 configuration.

This is for a DDR3 board, but the DIMM type really makes no difference in this regard.

[pcgamesn] AMD is giving Threadripper 2 moar cores and a top TDP of 250W

Senior member

Senior member

Senior member

Senior member

Senior member

Senior member

Diamond Member

Senior member

Senior member

Senior member

Golden Member

Senior member

Senior member

Junior Member

Golden Member

Senior member

Golden Member

Junior Member

Junior Member

Junior Member

Golden Member

Senior member

Diamond Member

Platinum Member

Golden Member