Question [Anand] A Peek Into Graviton2: Amazon's Neoverse N1 Server Chip First Impressions

Hitman928 · Mar 10, 2020

AnandTech Forums: Technology, Hardware, Software, and Deals

Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

www.anandtech.com

They have a comparison against current AWS offerings from AMD and Intel instances (Rome instances not yet available).

However, I grabbed the SPEC2006Int numbers from a previous article for Rome to compare. This isn't a cloud instance so I don't know how much that would effect the Rome performance.

Actual power use for Graviton 2 system is not available and Amazon didn't release a TDP number. Andrei is estimating between 80 W - 110 W. Given the Ampere 80 core ARM CPU is 210 W at 3 GHz (unclear if 3 GHz is all core turbo at 210 W or if all core turbo is lower), this CPU with 64 cores at 2.5 GHz I would put at the higher level of his range, maybe higher depending on how much of the power use is uncore (i.e. the power use won't scale as expected by frequency and core count because the uncore will be a significant portion of TDP) and what the actual all core frequency of Ampere is.

Thala · Mar 10, 2020

Hitman928 said:
Modern CPUs aren't restricted to one digital domain internally though I'm less familiar with ARM, I'd be surprised if they were different in this regard given the SOC design.

The 105 W TDP isn't based on hardware measurements, it's based on RTL simulations with an unknown configuration for the SOC portion.

Has nothing to do with modern. ARM CPUs typically do not have turbo mode, which generally requires addition of another voltage supply rail. As i said, i expect that most of the logic has single supply, then another for SRAMs, then another for analog and then potrentially a fourth for some peripherals.
Besides gate level or RTL level power simulation are very precise these days - you dont need power measurements in order to make an accurate prediction. ARM is very precise that the 105W TDP estimate refers to a 64 core hyperscale reference design running at 2.6GHz.

Not sure what issues you have with both ARMs simulations and the scaling from Andrei, both arriving at the same conclusion. It is not necessarily 80W like the lower bound of Andrei's estimate but certainly below 105W.

insertcarehere · Mar 11, 2020

senttoschool said:
As an AMD stock owner, I'm a bit nervous seeing these numbers. The server space is one area they're banking on for growth but it seems like ARM is about to kick some major x86 ass over the next few years.

Intel and AMD are no longer each other's biggest competitor in the server space.

And to think the N1 core is effectively "last-generation" as it is based off ARM's A76 core, not the newest A77.

Schmide · Mar 11, 2020

I am duly impressed

NostaSeronx · Mar 11, 2020

N1 is also probably on bigger libraries than Deimos/Hercules

Richie Rich · Mar 11, 2020

What is the Graviton 2's die size, do we know?

- 64 cores are 1.4mm2 x 64 = 90mm2
- 32MB L3$ in Zen2 at 7nm is 35mm2
- DRAM+PCIe lines ... 200mm2?
- my estimation is around 325-350mm2
- IMHO Graviton2 must be very cheap to produce

I don't like only 32MB L3$ for 64-core CPU:

- Graviton2.... L1: 64kB/core .... L2: 1MB/core ....... L3: 0.5 MB per core.
- Zen1. .......... L1: 64kB/core .... L2: 0.5MB/core .... L3: 2 MB per core.
- Zen2 ........... L1: 32kB/core .... L2: 0.5MB/core .... L3: 4 MB per core.
- Graviton2 looks kind of unbalanced at L3$. It would need 2MB per core so 128MB total.
- 128MB L3$ would add 105mm2 more, so total 425-450 mm2. Maybe this is too much in terms of yields.
- or alternatively shrink L2$ to 0.5 MB and go for 1MB L3$ while maintaining same die space.

Surprisingly high ST IPC for an A76 based core (+30% is way to much). I don't understand how they did that. It cannot be just boosted memory system. IMHO there might be enlarged also ROB, buffers and schedullers.

Graviton 2 is ARM's breakthrough into server market. It destroys Zen1 based EPYC in everything (price, consumption and ST and MT performance, IPC per thread).

Zen2 EPYC will be tougher nut to crack (especially at MT performance) however due to 64 core it runs at lower clock around 2.6 Ghz (Zen1 EPYC was 2.9 Ghz) with higher TDP. IF ROME will reach ST and MT performance (and probably exceed in MT) Amazon's manufacturing costs is fraction of price for ROME. Not speaking about electricity cost cut by half (extending CPU replacement period so melting down the cost even further).

It's clear that Amazon with ARM CPUs will attack servers by price and service costs and there is nothing Intel and AMD can do about that in near future (and Nuvia will attack from the other side: unmatched high performance will result in x86 being squeezed everywhere). Let's see how will perform 2nd gen AKA Neoverse N2/Zeus based on A77 (which has +25% higher IPC than A76) against Zen3/TigerLake.

Is possible to benchmark another 64-core server ARM chip HiSilicon's Kungpeng 920, please? It should be Chinese developed core (4xALU + SMT).
https://www.tomshardware.com/news/h...-motherboard-for-kunpeng-920-armv8-processors

DisEnchantment · Mar 11, 2020

Hitman928 said:
BTW, notice I never said Andrei's estimate is wrong, just that I don't agree with the way he got there and it's most likely at the upper end of his estimated range (and maybe a little higher) even though he says that his upper end is pessimistic.

I would not look so much into this. I would preferred if this article could be more like, hey there are some new offerings in the market to try out if you have a service to run in the cloud.
I am sure you are not stupid if you decide against it.

I mean if you are a serious cloud service operator
- You dont pay the amount quoted in this article for hosting your services
- You dont need a SPEC benchmark from somewhere to decide your make/buy decisions. You can get a free Premium subscription to try out your services on the new instances. Each service is unique. It it is not like there is a bunch of apps (like game benchmarking, if you will) and then everyone runs the same thing. Every service has its own unique code base, unique framework, unique dataset, unique programming language, unique problems it is solving. These guys literally pester you to try out their stuff. They will be more than happy to let you try out for free.
- If you rely your make/buy decision on this what will the hundreds (if not thousands) of guys in your department do?? It is their day and night job to ensure zero downtime, 24/7.
- Can you deploy this in all Geos. What if your services in West-EU, East/West-US and so on, run on this and then you have a different platform running in China on Alibaba/Tencent Infra where AWS cannot operate??
- If you operate on this HW you need to be sure what is the roadmap, are you going to be able to scale your workload in the future? Or do you have to take down the service offline every time there is infra upgrade.

We dont need to keep adding things here but if you are working in IT/Cloud/Edge you probably have your own laundry list of things to go through before you can even consider this.
Same like how AMD thought they could just take market share from Intel but then reality sets in, that is considering even a cold migration is possible and your code could more or less be compatible.

lobz · Mar 11, 2020

DisEnchantment said:
Just my 2¢ , first off, I acknowledge the SPEC performance is impressive for ST. I will refrain from comparing it to any platform.

I am not very sure I (we) will buy an AWS subscription based on SPEC benchmarks. As a disclaimer, I (we) dont have an AWS subscription but I (we) do have Azure Premium SKU subscription but trying to be fair to AWS
I would have like to see more comprehensive benchmarks to come to the conclusion above.

Something like PHP or Java benchmarks would be useful to see whether you can really benefit from an overall migration to this instance.
Compression performance ( its critical when serving content to ensure you dont clog the network with your traffic. ... i.e. 'Content-Encoding: gzip, deflate' )
(De)encryption performance ( its critical when you when you serve encrypted content .... i.e. HTTPS.)
Database performance like PostgreSQL

I can think of API gateways being a good use case or AWS Lambda but again without other benchmarks like Java or PHP will be hard to digest a recommendation based purely on SPEC.

Additionally I want to add,
If you have a cluster of nodes already running the same SW you don't just switch just like that to a different platform. People do live migration from a running machine to do HW upgrade. So for existing infrastructure it will be fairly hard. Compare this to migration from Intel/AMD, You can do a cold migration from Intel to AMD or a live migration from Intel to Intel.
Regarding how suitable this is for other workloads like CI/CD, HPC, CDN and other cloud work loads, the article skips completely, for some reason.

What are you babbling about? JUST BUY IT!

/facepalm

Charlie22911 · Mar 12, 2020

NostaSeronx said:
N1 is also probably on bigger libraries than Deimos/Hercules

View attachment 17985

please excuse my ignorance, can someone tell me what I’m looking at here? Is this a die?What’s up with all the color?

Nothingness · Mar 12, 2020

Charlie22911 said:
please excuse my ignorance, can someone tell me what I’m looking at here? Is this a die?What’s up with all the color?

It's a post synthesis place and route view of a CPU. The various colors correspond to blocks (branch pred, icache, integer core, MMU, etc.).

Search

Question [Anand] A Peek Into Graviton2: Amazon's Neoverse N1 Server Chip First Impressions

Hitman928

Diamond Member

AnandTech Forums: Technology, Hardware, Software, and Deals

Thala

Golden Member

insertcarehere

Senior member

Schmide

Diamond Member

NostaSeronx

Diamond Member

Richie Rich

Senior member

DisEnchantment

Golden Member

lobz

Platinum Member

Charlie22911

Senior member

Nothingness

Diamond Member

TRENDING THREADS