AMD EPYC Server Processor Thread - EPYC 7000 series specs and performance leaked

Page 17 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
Run-of-the-mill x86 server CPUs can't handle HPC? Then tell me what is actually used for HPC, other than supercomputers?

Poorly phrased - meant run of the mill x86 servers. But it is true that most of the grunt for HPC comes from GPU accelerators.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,774
3,596
136
Poorly phrased - meant run of the mill x86 servers. But it is true that most of the grunt for HPC comes from GPU accelerators.
Wrong.

GPUs are suited very well for some kinds of workloads - when your data is regular, when bringing that data to the ALUs is already taken care of, and when it is highly parallelizable. In the real world, you got to worry about things like Amdahl's law, memory and interface bandwidth, how you issue instructions etc. On top of that GPUs suck miserably when it comes to double precision performance, and are therefore lacking in a crucial aspect when it comes to HPC.

When you got to plough a farmland what would you rather have - four bullocks or four thousand chickens?
 

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
I like Gustafson's law better. The big problem for GPUs and DP is latency and bandwidth, hence the reason top GPUs are going to HBM memory (and faster connects to the CPU). Every component superior in a particular problem domain. They don't hang all those GPUs off of HPC machines for nothing.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,774
3,596
136
I like Gustafson's law better. The big problem for GPUs and DP is latency and bandwidth, hence the reason top GPUs are going to HBM memory (and faster connects to the CPU). Every component superior in a particular problem domain. They don't hang all those GPUs off of HPC machines for nothing.
GPUs are used as accelerators, which means their function is to speed-up specific kernels which constitute only a part of the overall computational task.

They hang off clusters with hundreds of nodes which contain thousands of x86 cores, which do the bulk of the heavy-lifting.
 
  • Like
Reactions: Drazick

Ajay

Lifer
Jan 8, 2001
15,431
7,849
136
GPUs are used as accelerators, which means their function is to speed-up specific kernels which constitute only a part of the overall computational task.

They hang off clusters with hundreds of nodes which contain thousands of x86 cores, which do the bulk of the heavy-lifting.

Fair enough, I may need to do some homework...
 

Tuna-Fish

Golden Member
Mar 4, 2011
1,346
1,525
136
GPUs are used as accelerators, which means their function is to speed-up specific kernels which constitute only a part of the overall computational task.

They hang off clusters with hundreds of nodes which contain thousands of x86 cores, which do the bulk of the heavy-lifting.

This depends on the domain. The one workload I've seen up close where lots of GPUs were used, fluid dynamics for weather forecasting, the GPUs did practically all the work and the CPUs were there just to shuffle data around.
 
  • Like
Reactions: Gideon and TheGiant

tamz_msc

Diamond Member
Jan 5, 2017
3,774
3,596
136
This depends on the domain. The one workload I've seen up close where lots of GPUs were used, fluid dynamics for weather forecasting, the GPUs did practically all the work and the CPUs were there just to shuffle data around.
It depends on the fraction of overall code occupied by the linear solver - the higher the fraction, the better speedup GPUs can achieve - see page 18 of this PDF. Obviously this is going to be domain-specific, like you say, but it is important to note that even when we expect GPUs to be ahead in a certain scenario, a multi-core CPU is not far behind:
https://www.cfd-online.com/Forums/hardware/187098-gpu-acceleration-ansys-fluent.html
 
  • Like
Reactions: Drazick

Gideon

Golden Member
Nov 27, 2007
1,625
3,650
136
That's truly impressive. Especially if you consider that they tested the dual-socket $4200 EPYC 7601 in a single socket. One that you'll rarely see in real systems, as the Single Socket EPYC 7551P is almost identical (just a 200 Mhz clock-speed deficit) and only costs only $2100.

And if the customer is memory bandwidth or IO bound then even the 24 core $1075 EPYC 7401P will do nicely.
 

raghu78

Diamond Member
Aug 23, 2012
4,093
1,475
136
phoronix:Initial Benchmarks Of The AMD EPYC 7601 On Ubuntu Linux

AMD EPYC 7601 (32C64T)vs 2x intel Xeon gold 6138(40C80T)

http://www.phoronix.com/scan.php?page=article&item=epyc-7601-linux&num=4

Looking good for AMD to make serious server market share gains in 2018. AMD's roadmap is pretty strong and there are already rumours of 7nm chip tapeouts.

http://www.barrons.com/articles/amds-epyc-design-impresses-says-canaccord-7-nano-is-next-1504905645

"Further, we anticipate these new 7nm products will compete with the 10nm roadmap from Intel. Given our view – shared by Mark Papermaster in our recent discussions – that 7nm foundry silicon is roughly equivalent to Intel's 10nm node in terms of realized performance/density/power, we believe another significant shift in the competitiveness of AMD's products versus Intel's will take place in 2018/19, leaving AMD at process node parity with Intel for the first time in well over a decade. We continue to believe investors underestimate the importance of this roadmap development for AMD's second-generation Zen-based products and primarily focus only on nearer-term results. Should AMD execute the 7nm roadmaps in 2018/19, we believe more substantial share gains could result. In addition, industry contacts indicate datacenter customers watch 7nm developments closely under NDAs and we believe AMD has already "taped out" multiple 7nm chips with positive early indications."

One thing is sure Intel will face much more competitive pressure in 2019 as GF 7LP is much more competitive vs Intel 10nm than GF 14LPP was against Intel 14nm. GF 7LP is designed for very high performance (5 Ghz operation) compared to GF 14LPP (3.5 Ghz) and the Contacted Gate Pitch x Minimum Metal pitch metric is much closer to Intel 10nm than GF 14LPP was against Intel 14nm

https://www.semiwiki.com/forum/cont...alfoundries-discloses-7nm-process-detail.html
https://www.semiwiki.com/forum/content/6713-14nm-16nm-10nm-7nm-what-we-know-now.html

Intel 14nm
CPP - 70 nm
MMP - 52 nm
CPP * MMP = 70 * 52 = 3640

Intel 10nm
CPP - 54 nm
MMP - 36 nm
CPP * MMP = 54 * 36 = 1944

GF 14LPP
CPP - 78 nm
MMP - 64 nm
CPP * MMP = 78 * 64 = 4992

GF 14LPP
CPP - 56 nm
MMP - 40 nm
CPP * MMP = 56 * 40 = 2240
 
  • Like
Reactions: moinmoin and Gideon

Topweasel

Diamond Member
Oct 19, 2000
5,436
1,654
136
Makes perfect sense really. How many 4 socket systems are there being spec'ed up these days?
How much of that is Intels fault though. Since all the sockets where point to point using QPI and didn't require any major changes to the CPU, the huge upswing in cost and it's near inability to use 2U made it an interesting setup. It cost more and you didn't get an increase in density. I don't know why someone wouldn't look at EPYC and say well we can get 32c or 64c in 2u, why not get 64c. That one system could negate my company's complete virtualization setup.

That said 4P still has a place. My company (and my parent company) made a move to modular SAN solutions by combining SAN with compute boxes. Because the SAN portion is already so large, they are configurable with 4P solutions. That said the memory and HDD/SSD/NVME connectivity of EPYC and the massive core count of a 2P system would still make it a healthy option.