Question Here comes A64FX: Fugaku is now the fastest supercomputer in the world

DrMrLordX · Jun 23, 2020

Japan’s Fugaku gains title as world’s fastest supercomputer

www.riken.jp

Summit has been dethroned by A64FX. That's a heck of a linpack score, doncha think? And it did that without even using all of its nodes.

Hitman928 · Jun 24, 2020

SarahKerrigan said:
I think he was kidding.

A64FX is a pretty direct variation of the same microarchitecture Fujitsu has been iterating on since SPARC64 V, across three different instruction sets. (GS, SPARC, ARM)

You would think he was kidding, but. . .

Richie Rich said:
Keller left AMD in 1999 when AMD canceled his new big K8a core based on Alpha EV8 (EV8 was super wide core with SMT4 and Keller was ex-Alpha engineer). But unfortunately AMD decided to just to evolve K7 core and implement memory controler into CPU like Alpha EV7 did.Then Keller was at beginning of PA semi, then bought by Apple, layed down first independent Apple uarch A6 (2xALU, OoO, 32-bit ARM) and A7 Cyclone (first 64-bit ARM core ever, 4xALU OoO pretty powerfull core, similar to Intel Haswell released in the same 2013, so yes, Apple had very competitive state of the art core like Intel since 2013) and Keller left Apple in 2012, one year before A7 release (but surely taped out) and left A8, A9 and A10 in development. He probably set goals for 6xALU monster A11 Monsoon family, including AMX and SVE. Then he decided to help AMD to return to the top and build super wide core with modern SIMD and SMT4 like EV8. Obviously he decided to create hybrid of A7 and Bulldozer first (Zen1) and then for next uarch to choose ARM ISA, super wide 6xALU+SVE/AMX core like in Apple.... and finally to implement the main feature of revolutionary EV8, the SMT4. But AMD staff was scared by parameters he has chosen for K12, they thought he is risking already a lot by deciding that Zen1 to be 4xALU (remember in 2012 there was no 4xALU on market, Haswell and Apple A7 came 2013). K12 spec was sci-fi like original K8a before cancelation. So later on K12 was cut down to 4xALU and SVE and later on sold to Fujitsu. Zen3 is a x86 version of K12 lite, so probably still 4xALU but powerfull FPUs similar to A64FX (no surprise, it has same roots, also some Zen3 leak mentioned 50% IPC jump in FPU load, confirming that). Since Fujitsu A64FX doesn't have SMT4, it looks like SMT4 was cut down from Zen3 as well. In Intel Keller is responsible for chiplet design of Alder Lake (8 big Golden Cove cores and 8-small Gracemont cores active out of 16-core Gracemont chiplet, fully 16-core capable chiplets will be used in Snow Ridge server CPU platform). So yes, Apples 6xALU core family, SVE, AMX, Fujitsu A64FX and Zen3/Zen4, chiplet Alder Lake are all Jim Keller's babys maybe.

Still no sources.

Thunder 57 · Jun 24, 2020

Richie Rich said:
Keller left AMD in 1999 when AMD canceled his new big K8a core based on Alpha EV8 (EV8 was super wide core with SMT4 and Keller was ex-Alpha engineer). But unfortunately AMD decided to just to evolve K7 core and implement memory controler into CPU like Alpha EV7 did.Then Keller was at beginning of PA semi, then bought by Apple, layed down first independent Apple uarch A6 (2xALU, OoO, 32-bit ARM) and A7 Cyclone (first 64-bit ARM core ever, 4xALU OoO pretty powerfull core, similar to Intel Haswell released in the same 2013, so yes, Apple had very competitive state of the art core like Intel since 2013) and Keller left Apple in 2012, one year before A7 release (but surely taped out) and left A8, A9 and A10 in development. He probably set goals for 6xALU monster A11 Monsoon family, including AMX and SVE. Then he decided to help AMD to return to the top and build super wide core with modern SIMD and SMT4 like EV8. Obviously he decided to create hybrid of A7 and Bulldozer first (Zen1) and then for next uarch to choose ARM ISA, super wide 6xALU+SVE/AMX core like in Apple.... and finally to implement the main feature of revolutionary EV8, the SMT4. But AMD staff was scared by parameters he has chosen for K12, they thought he is risking already a lot by deciding that Zen1 to be 4xALU (remember in 2012 there was no 4xALU on market, Haswell and Apple A7 came 2013). K12 spec was sci-fi like original K8a before cancelation. So later on K12 was cut down to 4xALU and SVE and later on sold to Fujitsu. Zen3 is a x86 version of K12 lite, so probably still 4xALU but powerfull FPUs similar to A64FX (no surprise, it has same roots, also some Zen3 leak mentioned 50% IPC jump in FPU load, confirming that). Since Fujitsu A64FX doesn't have SMT4, it looks like SMT4 was cut down from Zen3 as well. In Intel Keller is responsible for chiplet design of Alder Lake (8 big Golden Cove cores and 8-small Gracemont cores active out of 16-core Gracemont chiplet, fully 16-core capable chiplets will be used in Snow Ridge server CPU platform). So yes, Apples 6xALU core family, SVE, AMX, Fujitsu A64FX and Zen3/Zen4, chiplet Alder Lake are all Jim Keller's babys maybe.

Paragraphs. Use them. Otherwise I will not read this.

SarahKerrigan · Jun 24, 2020

Richie Rich said:
Keller left AMD in 1999 when AMD canceled his new big K8a core based on Alpha EV8 (EV8 was super wide core with SMT4 and Keller was ex-Alpha engineer). But unfortunately AMD decided to just to evolve K7 core and implement memory controler into CPU like Alpha EV7 did.Then Keller was at beginning of PA semi, then bought by Apple, layed down first independent Apple uarch A6 (2xALU, OoO, 32-bit ARM) and A7 Cyclone (first 64-bit ARM core ever, 4xALU OoO pretty powerfull core, similar to Intel Haswell released in the same 2013, so yes, Apple had very competitive state of the art core like Intel since 2013) and Keller left Apple in 2012, one year before A7 release (but surely taped out) and left A8, A9 and A10 in development. He probably set goals for 6xALU monster A11 Monsoon family, including AMX and SVE. Then he decided to help AMD to return to the top and build super wide core with modern SIMD and SMT4 like EV8. Obviously he decided to create hybrid of A7 and Bulldozer first (Zen1) and then for next uarch to choose ARM ISA, super wide 6xALU+SVE/AMX core like in Apple.... and finally to implement the main feature of revolutionary EV8, the SMT4. But AMD staff was scared by parameters he has chosen for K12, they thought he is risking already a lot by deciding that Zen1 to be 4xALU (remember in 2012 there was no 4xALU on market, Haswell and Apple A7 came 2013). K12 spec was sci-fi like original K8a before cancelation. So later on K12 was cut down to 4xALU and SVE and later on sold to Fujitsu. Zen3 is a x86 version of K12 lite, so probably still 4xALU but powerfull FPUs similar to A64FX (no surprise, it has same roots, also some Zen3 leak mentioned 50% IPC jump in FPU load, confirming that). Since Fujitsu A64FX doesn't have SMT4, it looks like SMT4 was cut down from Zen3 as well. In Intel Keller is responsible for chiplet design of Alder Lake (8 big Golden Cove cores and 8-small Gracemont cores active out of 16-core Gracemont chiplet, fully 16-core capable chiplets will be used in Snow Ridge server CPU platform). So yes, Apples 6xALU core family, SVE, AMX, Fujitsu A64FX and Zen3/Zen4, chiplet Alder Lake are all Jim Keller's babys maybe.

No. A64FX is a very clear evolution of what Fujitsu was already building. It looks almost exactly like XIfx at a microarchitectural level, just enhanced. Fujitsu has a very clear uarch family starting from SPARC64 V, and they have iterated on it for specific products in the SPARC64, SPARC64fx (HPC chips prior to A64FX), GS, and now A64FX family.

A64FX also isn't particularly oriented toward general purpose loads. It is nothing like K12, lol.

IntelUser2000 · Jun 24, 2020

DrMrLordX said:
Theoretically, had Intel's process advantage survived, we would still have AVX512-based Phi products out there doing essentially the same thing, but in the end Phi was still never all THAT great compared GPGPU options.

The problem with Phi was that it was too narrow, so it was bottlenecking it in real world HPC applications, and even in Linpack.

Had Intel's process been more in line, a similar Xeon line may have been possible. Fujitsu's A64FX gets its Flops with only 48 cores and 2.2GHz.

It's likely a better system than Summit because its a CPU but the perf/watt doesn't really improve and Summit is nearly 2 years old.

SarahKerrigan · Jun 24, 2020

IntelUser2000 said:
The problem with Phi was that it was too narrow, so it was bottlenecking it in real world HPC applications, and even in Linpack.

Had Intel's process been more in line, a similar Xeon line may have been possible. Fujitsu's A64FX gets its Flops with only 48 cores and 2.2GHz.

It's likely a better system than Summit but the perf/watt doesn't really improve and Summit is nearly 2 years old.

Perf/W in Linpack didn't improve. In HPCG performance went up by several times, so if it holds to ~28MW for Fugaku and ~10MW for Summit, that's still a win. I suspect a lot of it will come down to application performance, and that's something Linpack doesn't tell us. (I expect it to be generally both significantly faster and at least somewhat more efficient than Summit, but there are realistically going to be apps that are friendly enough to GPUs that the efficiency win doesn't materialize.)

DrMrLordX · Jun 25, 2020

Richie Rich said:
So yes, Apples 6xALU core family, SVE, AMX, Fujitsu A64FX and Zen3/Zen4, chiplet Alder Lake are all Jim Keller's babys maybe.

Keller had nothing to do with Golden Cove. It was too far into development by the time he joined for him to be directly responsible for it. It's successor? Sure.

IntelUser2000 said:
The problem with Phi was that it was too narrow, so it was bottlenecking it in real world HPC applications, and even in Linpack.

I guess? Nearly every x86 design ever has been too narrow compared to GPGPU compute accelerators.

Richie Rich · Jun 25, 2020

SarahKerrigan said:
No. A64FX is a very clear evolution of what Fujitsu was already building. It looks almost exactly like XIfx at a microarchitectural level, just enhanced. Fujitsu has a very clear uarch family starting from SPARC64 V, and they have iterated on it for specific products in the SPARC64, SPARC64fx (HPC chips prior to A64FX), GS, and now A64FX family.

A64FX also isn't particularly oriented toward general purpose loads. It is nothing like K12, lol.

View attachment 24046
View attachment 24047

Yeah, I know, That was joke about K12 sold to Fujitsu

I also joked that Keller is responsible for 6xALU A11 Monsoon family, SVE and AMX instructions extension too.

Back to serious note: If AMD wouldn't cancel their K12 they could compete against Fujitsu A64FX today. And maybe the fastest SC would be AMD's one. I guess AMD would get to SVE/SVE2 instruction set in early stage. This means that K12 would have SVE too.

Time line was:

Fujistu A64FX manufactured 2020
A64FX start of development 2016
SVE specifications 2014-2016
AMD K12 was canceled 2015

Maybe Keller wanted to adopt SVE for K12 and rework FPUs?
Maybe Keller wanted to adopt SVE also for Zen3 (x86 sister core of K12) and AMD management didn't have enough courage to step out of Intel's AVX shadow (like they did with AMD64 extension)?

Either way, cancelation of K12 was horrible horrible mistake that AMD did. They did to Keller for second time (first was canceled his much ambitious K8a in 1999) and it turned that Keller was right in both cases.

IntelUser2000 · Jun 25, 2020

DrMrLordX said:
I guess? Nearly every x86 design ever has been too narrow compared to GPGPU compute accelerators.

No, Phi targetted applications that were for server CPUs but with more vectors and more threads. You are talking about something different, which is vector width.

In those very applications it was narrow. The 2-issue unit limited performance. So an ideal Phi CPU will in every generation improve both scalar and vector performance significantly.

moinmoin · Jun 25, 2020

The article in OP had several mentions of Fugaku helping with "Society 5.0" which sounded odd to me, so this is what I got:

Society 5.0

www8.cao.go.jp

Very Japanese approach to life if that's your thing.

myocardia · Jun 25, 2020

Hitman928 said:
Very impressive, though I am a bit surprised at the power use.

I just read the article, then read it again looking specifically for mention of power usage, and I saw nothing at all concerning power usage. Where did you see mention of power usage, in a separate article? If so, mind linking it for us?

Hitman928 · Jun 25, 2020

myocardia said:
I just read the article, then read it again looking specifically for mention of power usage, and I saw nothing at all concerning power usage. Where did you see mention of power usage, in a separate article? If so, mind linking it for us?

It's listed in the top 500 list.

June 2020 | TOP500

www.top500.org

myocardia · Jun 25, 2020

AnandThenMan said:
My dual Celeron system can outperform that.

1,000,000 years Celeron vs. 1 second Fugaku

I assume you mean the 300A Celerons? Running at 450 Mhz, of course. Hmm, that means the in-order 1.6 Ghz Intel Atom CPU in an MSI Wind could complete the same task that the dual 450 Mhz 300A CPUs could, although it would take more like 10,000,000 years, instead of the speedy 1,000,000 years that the Celerons would take.

AnandThenMan · Jun 25, 2020

myocardia said:
I assume you mean the 300A Celerons?

Heck no, nothing that blazing fast I'm taking Socket 370 Celeron.

myocardia · Jun 26, 2020

Hitman928 said:
It's listed in the top 500 list.

June 2020 | TOP500

www.top500.org

Thanks, I had clicked on that link, but only read the article. I had not noticed there was a list below the article, and wow, 28,300 KW is crazy, especially to not be running any GPUs, even moreso since they have not finished adding all of the nodes they're planning on having. They're going to be right at 30 MW, with all of the nodes. Still, once you realize it is up to 2.8X as fast as the 2nd-place system, the power numbers become much more in line with expectations.

myocardia · Jun 26, 2020

AnandThenMan said:
Heck no, nothing that blazing fast I'm taking Socket 370 Celeron.

Haha, those were reasonably fast (compared to their same-socket Pentiums), compared to the next generation Celerons, the Socket 478 Celerons. Those were absolutely hated, by more or less everyone.

Richie Rich · Jun 28, 2020

myocardia said:
Thanks, I had clicked on that link, but only read the article. I had not noticed there was a list below the article, and wow, 28,300 KW is crazy, especially to not be running any GPUs, even moreso since they have not finished adding all of the nodes they're planning on having. They're going to be right at 30 MW, with all of the nodes. Still, once you realize it is up to 2.8X as fast as the 2nd-place system, the power numbers become much more in line with expectations.

I'd like to see how Fukagu efficiency stands against CPU only Xeon/Epyc systems:

Fugaku (ARM A64Fx SVE) ................. Rmax 414,530 TFlops / 28,335 kW .............. efficiency 14.63 Tflops/kW
Sumit (Power9+Volta GV100) ........... Rmax 148,600 TFlops / 10,096 kW .............. efficiency 14.72 Tflops/kW
Selene (Epyc 7742, Ampere A100) ... Rmax 27,580 TFlops / 1,344 kW ................... efficiency 20.52 Tflops/kW

That's damn good efficiency for CPU only Fugaku. That's huge competition for GPUs in terms of price. Nvidia is way overpriced IMHO. In 2021 coming new ARM core line-up Matterhorn with SVE2 vectors SIMD so maybe we will see some Matterhorn based supercomputers too. Who knows maybe ARM is preparing not only A58, A79 and X2 cores. Maybe ARM will release F2 core with wider FPUs specificaly for supercomputers (based on X2).

Cortex X1 .................. 4x128-bit NEON
Cortex X2 could have 4x256-bit SVE2
Cortex F2 could have 4x1024-bit SVE2 ( 4x times faster than Fugaku A64FX 2x512-bit SVE)

That would massacre Nvidia based supercomputers.

AMD64Blondie · Jun 29, 2020

They sure fooled me.I had thought AMD was bringing back the Athlon 64 FX .

Ahh..nostalgia.

Miguel Melicias · Jun 17, 2021

Maybe the true Hero is Andrew Heller the founder of the Hal Computers
HAL Computer Systems - Wikipedia

Question Here comes A64FX: Fugaku is now the fastest supercomputer in the world

Lifer

Diamond Member

Diamond Member

Senior member

Elite Member

Senior member

Lifer

Senior member

Elite Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Golden Member

Junior Member