Question [Anand] A Peek Into Graviton2: Amazon's Neoverse N1 Server Chip First Impressions

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136

They have a comparison against current AWS offerings from AMD and Intel instances (Rome instances not yet available).

115096.png


However, I grabbed the SPEC2006Int numbers from a previous article for Rome to compare. This isn't a cloud instance so I don't know how much that would effect the Rome performance.

1583851592910.png

Actual power use for Graviton 2 system is not available and Amazon didn't release a TDP number. Andrei is estimating between 80 W - 110 W. Given the Ampere 80 core ARM CPU is 210 W at 3 GHz (unclear if 3 GHz is all core turbo at 210 W or if all core turbo is lower), this CPU with 64 cores at 2.5 GHz I would put at the higher level of his range, maybe higher depending on how much of the power use is uncore (i.e. the power use won't scale as expected by frequency and core count because the uncore will be a significant portion of TDP) and what the actual all core frequency of Ampere is.
 
Last edited:

mikegg

Golden Member
Jan 30, 2010
1,755
411
136
As an AMD stock owner, I'm a bit nervous seeing these numbers. The server space is one area they're banking on for growth but it seems like ARM is about to kick some major x86 ass over the next few years.

Intel and AMD are no longer each other's biggest competitor in the server space.
 
  • Like
Reactions: Etain05

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
1583852910278.png

perf/GHz table. I believe the libquantum score for Rome is not accurate though (or a bug in the GCC build) as it actually scores worse than first gen Epyc here despite the test being very heavily influenced by the cache/memory system which were greatly enhanced in Rome.
 

DisEnchantment

Golden Member
Mar 3, 2017
1,601
5,780
136
Just my 2¢ , first off, I acknowledge the SPEC performance is impressive for ST. I will refrain from comparing it to any platform.

If you’re an EC2 customer today, and unless you’re tied to x86 for whatever reason, you’d be stupid not to switch over to Graviton2 instances once they become available, as the cost savings will be significant.

I am not very sure I (we) will buy an AWS subscription based on SPEC benchmarks. As a disclaimer, I (we) dont have an AWS subscription but I (we) do have Azure Premium SKU subscription but trying to be fair to AWS
I would have like to see more comprehensive benchmarks to come to the conclusion above.

Something like PHP or Java benchmarks would be useful to see whether you can really benefit from an overall migration to this instance.
Compression performance ( its critical when serving content to ensure you dont clog the network with your traffic. ... i.e. 'Content-Encoding: gzip, deflate' )
(De)encryption performance ( its critical when you when you serve encrypted content .... i.e. HTTPS.)
Database performance like PostgreSQL

I can think of API gateways being a good use case or AWS Lambda but again without other benchmarks like Java or PHP will be hard to digest a recommendation based purely on SPEC.

Additionally I want to add,
If you have a cluster of nodes already running the same SW you don't just switch just like that to a different platform. People do live migration from a running machine to do HW upgrade. So for existing infrastructure it will be fairly hard. Compare this to migration from Intel/AMD, You can do a cold migration from Intel to AMD or a live migration from Intel to Intel.
Regarding how suitable this is for other workloads like CI/CD, HPC, CDN and other cloud work loads, the article skips completely, for some reason.
 
  • Like
Reactions: lobz

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
Just my 2¢ , first off, I acknowledge the SPEC performance is impressive for ST. I will refrain from comparing it to any platform.



I am not very sure I (we) will buy an AWS subscription based on SPEC benchmarks. As a disclaimer, I (we) dont have an AWS subscription but I (we) do have Azure Premium SKU subscription but trying to be fair to AWS
I would have like to see more comprehensive benchmarks to come to the conclusion above.

Something like PHP or Java benchmarks would be useful to see whether you can really benefit from an overall migration to this instance.
Compression performance ( its critical when serving content to ensure you dont clog the network with your traffic. ... i.e. 'Content-Encoding: gzip, deflate' )
(De)encryption performance ( its critical when you when you serve encrypted content .... i.e. HTTPS.)
Database performance like PostgreSQL

I can think of API gateways being a good use case or AWS Lambda but again without other benchmarks like Java or PHP will be hard to digest a recommendation based purely on SPEC.

Additionally I want to add,
If you have a cluster of nodes already running the same SW you don't just switch just like that to a different platform. People do live migration from a running machine to do HW upgrade. So for existing infrastructure it will be fairly hard. Compare this to migration from Intel/AMD, You can do a cold migration from Intel to AMD or a live migration from Intel to Intel.
Regarding how suitable this is for other workloads like CI/CD, HPC, CDN and other cloud work loads, the article skips completely, for some reason.

This is meant to be just a preview article so actual tests run are limited. I agree though, coming to such a strong conclusion based only on running SPEC seems to be a bit. . . premature.

Also, I don't know about the cloud deployment/delivery applications, but I would expect these CPUs to be slaughtered in HPC work loads by their x86 competitors. The Graviton2 and Ampere CPUs are not really built to be competitive in HPC workloads.
 
Last edited:

tamz_msc

Diamond Member
Jan 5, 2017
3,770
3,590
136
Also, I don't know about the cloud deployment/delivery applications, but I would expect these CPUs to be slaughtered in HPC work loads by their x86 competitors. The Graviton2 and Ampere CPUs are not really built to be competitive in HPC workloads.
I don't think so. HPC isn't all about AVX, and SPEC2017fp has many workloads derived from HPC applications. This ARM CPU is quite competitive in those workloads.
 
  • Like
Reactions: Etain05

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
I don't think so. HPC isn't all about AVX, and SPEC2017fp has many workloads derived from HPC applications. This ARM CPU is quite competitive in those workloads.

It does well when it's 64 core ARM versus 32 core 1st gen Ryzen (which is what it was tested against). I don't think it will look so hot against a 64 core Rome processor and will get crushed in the majority of HPC applications.
 

Nothingness

Platinum Member
Jul 3, 2013
2,400
733
136
It does well when it's 64 core ARM versus 32 core 1st gen Ryzen (which is what it was tested against). I don't think it will look so hot against a 64 core Rome processor and will get crushed in the majority of HPC applications.
If you want purely FP HPC then Fujitsu A64FX currently is the best ARM chip. But I'm not sure we'll see it outside of big Top500 systems. Hopefully some other SVE ARM chip will be released in the coming years that will close the FP gap even further.
 
  • Like
Reactions: Etain05

Andrei.

Senior member
Jan 26, 2015
316
386
136
Something like PHP or Java benchmarks would be useful to see whether you can really benefit from an overall migration to this instance.
Compression performance ( its critical when serving content to ensure you dont clog the network with your traffic. ... i.e. 'Content-Encoding: gzip, deflate' )
Such workloads are covered in the suit, see Perlbench for PHP-like stuff and there's compression as well.

The point of SPEC is that it casts a wide enough net of workload types with different characteristics that it should be an aggregate representation of performance.
(De)encryption performance ( its critical when you when you serve encrypted content .... i.e. HTTPS.)
Database performance like PostgreSQL
We don't have a good Linux test suite right now, I'll look into it in the coming months.

Given the Ampere 80 core ARM CPU is 210 W at 3 GHz (unclear if 3 GHz is all core turbo at 210 W or if all core turbo is lower), this CPU with 64 cores at 2.5 GHz I would put at the higher level of his range
The Altra's 3GHz/210W figures are for all cores, there's no other clock shenanigans. Going from 2.6GHz to 3.1GHz raises the power by 1.8x according to Arm, hence my estimates for the Graviton2.
 
  • Like
Reactions: Etain05

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
If you want purely FP HPC then Fujitsu A64FX currently is the best ARM chip. But I'm not sure we'll see it outside of big Top500 systems. Hopefully some other SVE ARM chip will be released in the coming years that will close the FP gap even further.

Yeah, the A64FX definitely looks interesting. Should easily outdo Rome in peak FLOPs but I'm not familiar enough with the chip to really comment further.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
The Altra's 3GHz/210W figures are for all cores, there's no other clock shenanigans. Going from 2.6GHz to 3.1GHz raises the power by 1.8x according to Arm, hence my estimates for the Graviton2.

Thanks for the clarification on the freq/TDP. The 1.8x quoted from ARM, is that per core power or SOC power?
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
Thanks for the clarification on the freq/TDP. The 1.8x quoted from ARM, is that per core power or SOC power?

That is for upping the supply voltage in order to clock higher. Affects the whole SoC.
Of course that unless they have a second supply for the uncore - but why should they?
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
That is for upping the supply voltage in order to clock higher. Affects the whole SoC.
Of course that unless they have a second supply for the uncore - but why should they?

AMD has a separate "SOC" voltage level and it wouldn't surprise me if ARM does as well, especially at 7nm, but that's not the crux of the question. The material I saw from ARM was quoting the power used by each core, so saying the power increases (or decreases) by X with frequency doesn't tell you how much the whole chip increases or decreases, just the core power.

i.e.

If a chip uses 200 W at 3 GHz and 15% of that is uncore and you say that going down to 2 GHz reduces core power by 50%, how much is uncore reduced by? If it has its own voltage line, probably not much. So then assuming the chip will use 100W is wrong because the uncore is still using basically the same amount of power. So your chip would then use 115 W instead of the 100 W you would get by assuming the 50% was chip power. The uncore is responsible for I/O and the memory controller, both of which you probably don't want to scale down just because your cores do.

I don't know how the Graviton 2 chips break down in power use and I was wondering if ARM was saying the whole chip uses 80% more power from 2.6 GHz to 3.1 GHz or if that is the increase in core power. I suspect it is the latter.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
AMD has a separate "SOC" voltage level and it wouldn't surprise me if ARM does as well, especially at 7nm, but that's not the crux of the question. The material I saw from ARM was quoting the power used by each core, so saying the power increases (or decreases) by X with frequency doesn't tell you how much the whole chip increases or decreases, just the core power.

i.e.

If a chip uses 200 W at 3 GHz and 15% of that is uncore and you say that going down to 2 GHz reduces core power by 50%, how much is uncore reduced by? If it has its own voltage line, probably not much. So then assuming the chip will use 100W is wrong because the uncore is still using basically the same amount of power. So your chip would then use 115 W instead of the 100 W you would get by assuming the 50% was chip power. The uncore is responsible for I/O and the memory controller, both of which you probably don't want to scale down just because your cores do.

I don't know how the Graviton 2 chips break down in power use and I was wondering if ARM was saying the whole chip uses 80% more power from 2.6 GHz to 3.1 GHz or if that is the increase in core power. I suspect it is the latter.

As i said, my expectation is, that they clocking the whole chip faster - which makes sense as just clocking the cores higher scales badly. So 80% more power for whole SoC.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
As i said, my expectation is, that they clocking the whole chip faster - which makes sense as just clocking the cores higher scales badly. So 80% more power for whole SoC.

But then when you scale your cores up to reach higher frequencies you're wasting excess power in the uncore which is running at the same speeds as before. Seems like a waste of power. Most IO also need specific voltages and shouldn't scale at all.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
I see where Andrei got it from, it's from the Neoverse N1 slide. I kind of overlooked it as Graviton isn't Neoverse although it is based from that.


03_Infra%20Tech%20Day%202019_Filippo%20Neoverse%20N1%20FINAL%20WM15.jpg


As I suspected, the 1.8x increase is core+L2 cache scaling, this won't apply to the full chip. I'd really be interested to see how the power use breaks down between cores, mesh, and controllers. I doubt we'll get that but if anyone has a link, it'd be very interesting to review.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
But then when you scale your cores up to reach higher frequencies you're wasting excess power in the uncore which is running at the same speeds as before. Seems like a waste of power. Most IO also need specific voltages and shouldn't scale at all.

Not sure what the argument is, we are talking about a particular implementation and not about future cores? Not sure if i understand you correctly...
Analog parts have different voltage - i was talking digital supply only - so the whole mesh network, caches, memory controllers (minus the PHY).
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
Not sure what the argument is, we are talking about a particular implementation and not about future cores? Not sure if i understand you correctly...
Analog parts have different voltage - i was talking digital supply only - so the whole mesh network, caches, memory controllers (minus the PHY).

I was asking if the 1.8x increase in power quoted was for cores only, or the full SOC. It was for core+L2 only. Andrei uses that same 1.8x figure to try and calculate the full SOC TDP of Graviton 2. I'm just pointing out that you can't do that without knowing more about the power break down for the whole SOC.
 

Thala

Golden Member
Nov 12, 2014
1,355
653
136
I was asking if the 1.8x increase in power quoted was for cores only, or the full SOC. It was for core+L2 only. Andrei uses that same 1.8x figure to try and calculate the full SOC TDP of Graviton 2. I'm just pointing out that you can't do that without knowing more about the power break down for the whole SOC.

And i was pointing out, that if the voltage increase affects most of digital domain, then Andrei's calculation is pretty much correct. In addition Andrei's calculation is in line with ARMs claims about the total power of a 64 core reference implementation, which is 105W TDP.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
And i was pointing out, that if the voltage increase affects most of digital domain, then Andrei's calculation is pretty much correct. In addition Andrei's calculation is in line with ARMs claims about the total power of a 64 core reference implementation, which is 105W TDP.

Modern CPUs aren't restricted to one digital domain internally though I'm less familiar with ARM, I'd be surprised if they were different in this regard given the SOC design.

The 105 W TDP isn't based on hardware measurements, it's based on RTL simulations with an unknown configuration for the SOC portion.
 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
Here's another piece of Neoverse material ARM put out. Their TDP estimate for data center configured Neoverse based CPUs starts at 150 W for 64 cores:

n1-announcement-power-ranges-768x309.png

 

Hitman928

Diamond Member
Apr 15, 2012
5,243
7,791
136
BTW, notice I never said Andrei's estimate is wrong, just that I don't agree with the way he got there and it's most likely at the upper end of his estimated range (and maybe a little higher) even though he says that his upper end is pessimistic.
 

Atari2600

Golden Member
Nov 22, 2016
1,409
1,655
136
It's crippled by having only 32MB L3. The design itself shows promise.

Very impressive given where they have come from.


But if folks are scared to move from Intel to AMD, imagine the frighteners about moving away from x86! (obviously, in some cases, that won't apply)