Cascade Lake beats Rome in the race for 2019 TACC Supercomputer

Gideon

Golden Member
Nov 27, 2007
1,608
3,570
136
https://www.nextplatform.com/2018/08/29/cascade-lake-heart-of-2019-tacc-supercomputer/

The system will have around 8,000 nodes with the future “Cascade Lake” Xeon SP Platinum processors, specifically with the follow-on to the 28-core “Skylake” Xeon SP-8180. These are architecturally straightforward but will run every single science code with very little fear and can be live soon without code changes.

Stanzione says TACC made the decision to go with the Cascade Lake SKUs that have the higher clock rates and they expect most codes will run significantly faster. His team took a close look at other processor options, including the 7 nanometer AMD “Rome” Epyc “chips coming next year, which he says were a closer frontrunner in their decision-making process. “We took a look at AMD Epyc, both Naples and certainly Rome, but with the combination of price, schedules, and performance, we felt like Cascade Lake was the way to get the best value right now. Our codes were just a little better for the time we needed this system but Rome is a promising architecture and we expect it is going to be a very good chip,” Stanzione explained.

Well that's disappointing. So according to them, Intel's 28 core Skylake derivative (from effin 2015) is a better all around Scientific Computing CPU than a 7nm 48 core Zen 2, which also has 30% more memory Bandwidth.

TACC seems to be affected by memory bandwidth at least somewhat (henche all the rage of going "one clock step up") and they seem to be using AVX-512 heavily:
“The core counts will go up from Stampede2 some, the node count by quite a bit, and the memory bandwidth will also increase since we are going up another clock step on the DIMMs. The cache per core is about the same but with that higher clock rate—probably between 25 percent to 30 percent [for AVX-512 vector units, not headline clocks] we are making some decisions about balance and tradeoffs in terms of energy.”

So it begs the questions:

A) What do they mean by "code changes"? Does this confirm that Rome does in fact not support AVX-512? (obviously not natively but it should support it at least by splitting it to smaller instructions)

B) Higher Clocks? Is TSMC process or AMDs execution indeed so bad, that Intel can effortlessly outclock Zen 2 while still maintaining decent perf/watt (which also seemed to be all the rage in the article)?

Overall a major disappointment. How can AMD not execute on such a process/arch lead? What happens when Intel gets its Sapphire Rapid + chiplets in order?
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,683
136
The quote mentions timing repeatedly, don't you think this had a considerable effect in their decision?
price, schedules, and performance
best value right now
a little better for the time we needed this

Sure, overall the quote can be read either way, but I fail to see the major disappointment in someone evaluating a product for use in a supercomputer and having this to say about it: "Rome is a promising architecture and we expect it is going to be a very good chip".
 

french toast

Senior member
Feb 22, 2017
988
825
136
It is quite the concern, obviously Cascadelake is coming sooner in volume which is one of their criteria, so there is that.
But they state that performance and price was also a consideration, and Cascadelake was the best balance of the three..at the expense of some efficiency.
It doesn't look good.
 

coercitiv

Diamond Member
Jan 24, 2014
6,151
11,683
136
But they state that performance and price was also a consideration, and Cascadelake was the best balance of the three..at the expense of some efficiency.
The compromise Zen made in terms of AVX performance had to backfire somewhere, otherwise it wouldn't be a compromise at all.

Maybe some were expecting to see Rome bulldoze their way into Intel's server lineup even before the product hit the market, but this level of expectation followed by immediate disappointment at the first sign of setback is more indicative of the real level of trust we put in AMD more than the merits of Rome and 7nm @ TSMC.
 

TheGiant

Senior member
Jun 12, 2017
748
353
106
Maybe Intel delivered Cascade Lake to some customers already, like with Skylake-SP months before official release.

As for 7nm Rome (and desktop products). I believe it when I see it and I hope it will be good, cause I am looking for a big fat CFD working machine ...
 
  • Like
Reactions: french toast

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
Key words being "right now". Intel does have the advantage of doing a lot of scientific software optimizations for their systems that makes more conservative consumers reluctant to switch the manufacturer of the CPUs. They probably didn't think Rome will offer enough of a price/performance advantage to first wait for it and then possibly having to develop software optimizations on their own.
 

IEC

Elite Member
Super Moderator
Jun 10, 2004
14,323
4,904
136
Stampede2 @ TACC has a bunch (1700+) of Skylake Xeon Platinum 8160s already, so it's not surprising at all that they would be keen to keep continuity with Cascade Lake.

The new kid on the block (Rome) would have to prove itself before they would take the risk of telling their users "sorry, gotta re-optimize your code, but we promise it's worth it"
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
Not entirely surprising mostly due to timing, I am not expecting to see Rome in volume till H2'19 IMO.
 

Dayman1225

Golden Member
Aug 14, 2017
1,152
973
146
I'm pretty surprised they aren't using Cascade Lake-AP.
Not announced(unless you were at that HPC event), probably not even sampling yet, they are aiming for Mid'19 release it seems
Intel-Xeon-Scalable-Family-Roadmap-820x445.jpg
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
If AMD has kept the core structure largely unchanged, it could well explain why they choose Intel.
If Rome maintains the same narrow core structure, it can only compete in "legacy" workloads which scientific / super computer workloads generally are not.

Such decision would be extremely sad to see, and IMO it would make no sense unless the die size is restricted for whatever reason. Even still, I can think many other blocks inside the current designs which could make room for the physically larger cores.
Frequency and power consumption wise I don't think there would be any issues, as AMD has more experience in implementing various power gating features than Intel does. Things that you learn when you have to deal with inferior manufacturing processes for years.
 

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
If AMD has kept the core structure largely unchanged, it could well explain why they choose Intel.
If Rome maintains the same narrow core structure, it can only compete in "legacy" workloads which scientific / super computer workloads generally are not.

That's where the rumored Zen 2+Vega 20 "APU" comes in. Whether we actually see it remains to be seen.
 

moinmoin

Diamond Member
Jun 1, 2017
4,933
7,619
136
If Rome maintains the same narrow core structure, it can only compete in "legacy" workloads which scientific / super computer workloads generally are not.

Such decision would be extremely sad to see, and IMO it would make no sense unless the die size is restricted for whatever reason. Even still, I can think many other blocks inside the current designs which could make room for the physically larger cores.
With Zen 2 AMD may still be trying to cover the widest possible market with as few dies as possible. In that case not yet going wide(r) is sensible.

As a related aside, from the article:
"Also on the machine, although not part of the expected peak floating point count, GPUs, most likely Nvidia Volta, will tack and additional three to four petaflops of single-precision performance onto Frontera as TACC tests the waters for key machine learning and molecular dynamics workloads that will perform well with lower precision."
Why not? Or rather, what would Nvidia and AMD need to do to have GPUs counted as part of the total performance of supercomputers?
 

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
With Zen 2 AMD may still be trying to cover the widest possible market with as few dies as possible. In that case not yet going wide(r) is sensible.

I'm sure they are, and I expect that the die(s) used in Rome is the very same which will be used in AM4 and SP3r2 (TR4) on the desktop.

AVX-512 support (regardless is native or joined) is far from paramount, but the cores should be able to execute 256-bit instructions at the same rate as Haswell and newer Intels.
Intel went wide on their consumer cores in 2013, so outing a narrow design in 2019 is IMO not acceptable.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
I'm sure they are, and I expect that the die(s) used in Rome is the very same which will be used in AM4 and SP3r2 (TR4) on the desktop.

AVX-512 support (regardless is native or joined) is far from paramount, but the cores should be able to execute 256-bit instructions at the same rate as Haswell and newer Intels.
Intel went wide on their consumer cores in 2013, so outing a narrow design in 2019 is IMO not acceptable.

I agree. I'll be a bit disappointed if 7 nm Zen isn't able to execute AVX256 at the same relative throughput as Haswell.
 
  • Like
Reactions: IEC

The Stilt

Golden Member
Dec 5, 2015
1,709
3,057
106
But that narrowness is AMD's real advantage, since they can more easily stick more cores. Especially given that most workloads don't really need it.

I think at this point we already have enough of cores.
So for the consumer market having even more is not much of an advantage.
>= 8 cores with as high IPC, high frequency and wide on demand (>= 256-bit) as possible is pretty much the optimum.

Pretty much what Intel is doing with i9-9900K.
AMD needs to respond to that, and preferably exceed it in every way.
 

NostaSeronx

Diamond Member
Sep 18, 2011
3,683
1,218
136
Zen2 speculation from the Nosta...
Lowest spec:
1.5x int(6 ALU/3 ALU) and 2x fpu 4 Add + 4 Mul // SMT4
Highest spec:
2x int PRF0 - 4ALU/2AGU w/ 16 KB L0d + PRF1 - 4ALU/2AGU w/ 16 KB L0d and 4x fpu two VLIW4 SIMD4 units // Also, SMT4.

I don't get it. Zen2 is a massive shift away from Zen and Jaguar. AMD literally had to abandon Jag/Zen template for HPC optimization in Zen2.
 
Last edited:
  • Like
Reactions: Drazick

jpiniero

Lifer
Oct 1, 2010
14,509
5,159
136
I highly doubt that AMD will have separate designs, besides having an APU and CPU.
I.E. same way as we currently have Zeppelin (Desktop, HEDT, Server) and Raven (Desktop, Mobile).

Agreed, but I don't see how that contradicts their desire to pack in the cores. AMD seems to like using moar corez as a marketing angle too.
 
  • Like
Reactions: Gikaseixas

JDG1980

Golden Member
Jul 18, 2013
1,663
570
136
There really aren't all that many applications that are parallel enough to benefit substantially from AVX, yet non-linear enough to be infeasible for GPGPU. It's not clear if it is in AMD's interest to devote a substantial amount of die space to those niche applications.
 

wahdangun

Golden Member
Feb 3, 2011
1,007
148
106
If bulldozer was not so much crap, we may ended in the world where HSA dominated, I mean I feel AVX was pretty redundant if gpu can accelerate pretty much 80% of program that use AVX and maybe with more time gpu becoming more like cpu with their asynchronous compute, maybe we will not need AVX at all.
 

tamz_msc

Diamond Member
Jan 5, 2017
3,708
3,554
136
AVX is niche? There is no need for AVX when GPUs exist? I don't mean to sound rude but some of you guys are woefully unaware of just how many applications use AVX these days. Frostbite, UE4, idTech use it. Driving Sims like Project Cars 2 use it, Path of Exile uses it for particle effects. The Ashes of The Singularity engine, whose Devs are AMD partners, uses it. These are but a few examples in games alone. Emulators use it. One of them even uses TSX. Then there is video encoding, openssl, distributed computing, plugins for video editing, etc. A lot many things use AVX that some of you might be unaware of.

The Stilt is right, Zen2 without AVX2 at least would be extremely disappointing.