Info [Toms, Anand] AMD EPYC Benchmarks

stockolicious · Aug 17, 2019

Abwx said:
It wasnt Dell but HP and 100k chips, btw Dell has been always completely sold out to Intel, prove is that they re the only ones that are not ready on day one of Rome launch, guess that they are more early infos providers for Intel than anything else...

Ive read this about Dell in the past 1 mill chips I think you can check the adoredTV vid about intel monopolistic practices to back this - but i guess we both get the point.

nicalandia · Aug 17, 2019

Just a Quick Question, all of the Epyc Rome systems are UMA right? Except fort he 2P system(two socket)? someone mentioned that all but the 64 core Rome Epycs were UMA(Uniform Memory Access)

DrMrLordX · Aug 17, 2019

All Rome chips should be UMA within a single socket, including the 64c chips. There's one memory controller to rule them all, and on the substrate bind them.

Unless 64c EPYC has two I/O dice. I don't think it does.

naukkis · Aug 17, 2019

All Rome chips can also configured to 4 numa socket per chip, 2 chiplets have a quarter of IO die and two memory channels. Memory latency will improve ~25ns and memory throughput some 10%, but with additional hassle with more numa nodes. Such a configuration option also tells that Rome IO chips is practically 4 Ryzen IO-chips linked together.

StefanR5R · Aug 17, 2019

DrMrLordX said:
All Rome chips should be UMA within a single socket, including the 64c chips. There's one memory controller to rule them all, and on the substrate bind them.

There are actually four 2-channel memory controllers in the IOD. Different CCDs experience slightly different latencies in memory accesses behind each of the four controllers.

DrMrLordX said:
Unless 64c EPYC has two I/O dice. I don't think it does.

All Epyc 7002 have just one IOD. (Putting two into one wouldn't bring any benefit without changing the socket also.)

naukkis said:
All Rome chips can also configured to 4 numa socket per chip, 2 chiplets have a quarter of IO die and two memory channels. Memory latency will improve ~25ns and memory throughput some 10%, but with additional hassle with more numa nodes.

Indeed. Though on one hand, in many usage scenarios the higher count of NUMA nodes in the system is not a problem; on the other hand, the slightly decreased memory performance in the default UMA configuration is of negligible effect in most common workloads.

naukkis said:
Such a configuration option also tells that Rome IO chips is practically 4 Ryzen IO-chips linked together.

Perhaps, but the actual implementation could be a bit more complex than that.

Yotsugi · Aug 17, 2019

nicalandia said:
Just a Quick Question, all of the Epyc Rome systems are UMA right?

Yes.

DrMrLordX said:
All Rome chips should be UMA within a single socket

You can still not-SNC up to 4 nodes per Rome package.

nicalandia · Aug 17, 2019

naukkis said:
All Rome chips can also configured to 4 numa socket per chip, 2 chiplets have a quarter of IO die and two memory channels. Memory latency will improve ~25ns and memory throughput some 10%, but with additional hassle with more numa nodes. Such a configuration option also tells that Rome IO chips is practically 4 Ryzen IO-chips linked together.

One Fat IO Chiplet to rule them all indeed.

Do you know if the new chipset for TR4 will use a Mirror Copy of the IO Chip like the X570 does for AM4? No news on new SP3 chipsets yet?

Yotsugi · Aug 17, 2019

nicalandia said:
Do you know if the new chipset for TR4 will use a Mirror Copy of the IO Chip like the X570 does for AM4?

Why wouldn't it?
Design reuse is neat.

naukkis · Aug 17, 2019

nicalandia said:
One Fat IO Chiplet to rule them all indeed.

Do you know if the new chipset for TR4 will use a Mirror Copy of the IO Chip like the X570 does for AM4? No news on new SP3 chipsets yet?

Actually x570 is interesting as AMD did not give details what it actually is. It's said to be Ryzen IO chip but build 14nm instead of 12nm. So did AMD plan to produce Ryzen IO-chip with 14nm and did do respin to 12nm and used already established 14nm lines to produce x570, or is x570 piece of Rome's IO chip? Because it's plain obvious that they won't establish own fab assembly for x570 when they could have used already in production 12nm IO-chip.

Which leads to what Threadripper IO-chip will be. Produced with 12nm it could provide higher memory clocks which are beneficial, but for economy reasons they would want to recycle Rome IO-chip.

DrMrLordX · Aug 17, 2019

StefanR5R said:
There are actually four 2-channel memory controllers in the IOD. Different CCDs experience slightly different latencies in memory accesses behind each of the four controllers.

Oh okay, thought it was one 8-channel controller. Which means . . .

Indeed. Though on one hand, in many usage scenarios the higher count of NUMA nodes in the system is not a problem; on the other hand, the slightly decreased memory performance in the default UMA configuration is of negligible effect in most common workloads.

Huh, didn't know you could force NUMA instead of UMA. That's kinda weird.

nicalandia · Aug 17, 2019

naukkis said:
Actually x570 is interesting as AMD did not give details what it actually is. It's said to be Ryzen IO chip but build 14nm instead of 12nm. So did AMD plan to produce Ryzen IO-chip with 14nm and did do respin to 12nm and used already established 14nm lines to produce x570, or is x570 piece of Rome's IO chip? Because it's plain obvious that they won't establish own fab assembly for x570 when they could have used already in production 12nm IO-chip.

Which leads to what Threadripper IO-chip will be. Produced with 12nm it could provide higher memory clocks which are beneficial, but for economy reasons they would want to recycle Rome IO-chip.

I was reading the the change to 12nm on Ryzen 3000 was to lower the power envelope as the AM4 platform is already constrained, but for TR4 and SP3 I don't believe that is the case. Both the 12nm and 14nm are built on GlobalFoundries

JoeRambo · Aug 17, 2019

DrMrLordX said:
Huh, didn't know you could force NUMA instead of UMA. That's kinda weird.

It is common technique in these big chips. Intel Skylake-SP also has this feature, where the die is split into two halves, each having access to half of memory controllers.
This mode has two benefits - one is obvious - for accessing die the latency is decreased, but also there is less load on internal interconnects on the other side of the die and that further helps with latencies and bandwidth.

The drawback is that workload needs to be NUMA aware ( or made into one with something like numactl ) AND fit in both NUMA memory ( half of what is connected to CPU in case of SKL-SP) and bandwidth ( also half ).

Asterox · Aug 17, 2019

"Quite expensive single socket ATX combination". But luckily there is Intel competition, that is likely to offer a more comfortable price=performance ratio.

https://www.newegg.com/promotions/s...NC-YouTube-_-NeweggInsider-_-AMD-_-Supermicro

https://www.newegg.com/supermicro-m...813183689?Item=N82E16813183689&Tpk=13-183-689

Markfw · Aug 17, 2019

Asterox said:
"Quite expensive single socket ATX combination". But luckily there is Intel competition, that is likely to offer a more comfortable price=performance ratio.

https://www.newegg.com/promotions/s...NC-YouTube-_-NeweggInsider-_-AMD-_-Supermicro

https://www.newegg.com/supermicro-m...813183689?Item=N82E16813183689&Tpk=13-183-689

Not fair that you tempt me like this. 64 cores and motherboard for $5400......

DrMrLordX · Aug 17, 2019

Might need to have an intervention. Put the wallet down sir!

Markfw · Aug 17, 2019

DrMrLordX said:
Might need to have an intervention. Put the wallet down sir!

I put myself on auto-notify for the 128 core 256 thread dual socket one..... I have the case and power supply required sitting idle. I just need memory I even have an extra nvme drive !

nicalandia · Aug 17, 2019

Markfw said:
I put myself on auto-notify for the 128 core 256 thread dual socket one....

With this new design(Chiplets and Central IO) what would be the core limit per CPU of future Zen3/4/5(if they keep the 4ccx/8c chiplet)? For example if they get down to 5nm? according to this article: https://hothardware.com/news/tsmc-5nm-node-doubles-density-amd-ryzen-3000-7nm the density is nearly doubled, would that mean a 128 Core, 256 Threads single Socket Epyc CPU? 32 Core Mainstream CPU? Perhaps that would be restricted for Data Center CPUs as I can't even fathom such a massive CPU for mainstream

DarthKyrie · Aug 17, 2019

Markfw said:
I put myself on auto-notify for the 128 core 256 thread dual socket one..... I have the case and power supply required sitting idle. I just need memory I even have an extra nvme drive !

I knew you wouldn't be able to pass that up.

DarthKyrie · Aug 17, 2019

nicalandia said:
With this new design(Chiplets and Central IO) what would be the core limit per CPU of future Zen3/4/5(if they keep the 4ccx/8c chiplet)? For example if they get down to 5nm? according to this article: https://hothardware.com/news/tsmc-5nm-node-doubles-density-amd-ryzen-3000-7nm the density is nearly doubled, would that mean a 128 Core, 256 Threads single Socket Epyc CPU? 32 Core Mainstream CPU? Perhaps that would be restricted for Data Center CPUs as I can't even fathom such a massive CPU for mainstream

It depends on how long it takes them to start doing the I/O on 7EUV instead of 14/12nm. I think that DDR memory stacking on the I/O die will come first.

Markfw · Aug 17, 2019

DarthKyrie said:
I knew you wouldn't be able to pass that up.

Actually due to the price, I may get 2 64 core threadrippers, I already have all the motherboard and coolers. Just want an option....

Panino Manino · Aug 17, 2019

DrMrLordX said:
All Rome chips should be UMA within a single socket, including the 64c chips. There's one memory controller to rule them all, and on the substrate bind them.

Unless 64c EPYC has two I/O dice. I don't think it does.

DarthKyrie · Aug 17, 2019

Markfw said:
Actually due to the price, I may get 2 64 core threadrippers, I already have all the motherboard and coolers. Just want an option....

Has AMD confirmed that Zen2 TR will work in X399 or not? I haven't seen anything on the matter.

nicalandia · Aug 17, 2019

DarthKyrie said:
It depends on how long it takes them to start doing the I/O on 7EUV instead of 14/12nm. I think that DDR memory stacking on the I/O die will come first.

I don't think they will be doing a 7nm IO chip at all since the reason for them not doing it along with the chiplet is that it will not scale linearly and very little gains will be obtain from 14/12nm process

DrMrLordX · Aug 17, 2019

Only reason to move I/O die to any other process at the present or near future would be to finally cut off GF. 7nm would probably be suitable. Just more expensive.

DarthKyrie · Aug 17, 2019

nicalandia said:
I don't think they will be doing a 7nm IO chip at all since the reason for them not doing it along with the chiplet is that it will not scale linearly and very little gains will be obtain from 14/12nm process

The reasoning has to do with cost/benefit, once the cost of 7EUV comes down enough they will scale it down. AFAIK some of the stuff in the I/O die is present in their GPUs which means that those parts have been shrunk to 7nm already.

Info [Toms, Anand] AMD EPYC Benchmarks

Member

Diamond Member

Lifer

Golden Member

Elite Member

Golden Member

Diamond Member

Golden Member

Golden Member

Lifer

Diamond Member

Golden Member

Golden Member

Moderator Emeritus, Elite Member

Lifer

Moderator Emeritus, Elite Member

Diamond Member

Golden Member

Golden Member

Moderator Emeritus, Elite Member

Golden Member

Golden Member

Diamond Member

Lifer

Golden Member