Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

exquisitechar · Dec 3, 2019

https://www.servethehome.com/aws-graviton2-64-core-arm-cpu-heightens-war-of-intel-betrayal/

Pretty big deal for ARM in servers. Interested in seeing a comparison between this and Rome.

Thunder 57 · Mar 25, 2020

Richie Rich said:
Lets see what price will Amazon set for Rome system. I would be surprised if it will be lower than Graviton2. IMHO cloud systems will dominate ARM soon or later. However X86 will still win where it can use higher clock >4GHz like HPC systems. But only until Nuvia CPUs arrive. That's pretty dark future for x86.

Of course it will cost less than Rome, it's in Amazon's best interest to sell Graviton even if it is at lower margins (in the short term anyway). Also, there's a big if Nuvia succeeds. Others have tried, and that's without the entire legal team of Apple on their backs.

exquisitechar · Mar 25, 2020

Richie Rich said:
Lets see what price will Amazon set for Rome system. I would be surprised if it will be lower than Graviton2.

Of course it won’t be lower, Graviton2 is Amazon’s in-house silicon.

amrnuke · Mar 25, 2020

Richie Rich said:
250% => 1/2.5 = 40% .... recalculated to different base (Zen2)

I like your calculation a lot. It makes sense that Rome will be tough competitor for Graviton2. But there are two points:
1) Graviton2 die size is about 300mm2 ... that's price of 7nm GPU Vega2 .... Amazon's cost per Graviton2 CPU is about 200-300$? That's the magic of buying ARM license and making your own CPU.
2) Electricity cost is also important. Especialy when your CPU price is super low then most part of cost is electricity bill.

Lets see what price will Amazon set for Rome system. I would be surprised if it will be lower than Graviton2. IMHO cloud systems will dominate ARM soon or later. However X86 will still win where it can use higher clock >4GHz like HPC systems. But only until Nuvia CPUs arrive. That's pretty dark future for x86.

It's not just the clocks. 7742 clocks higher than Graviton2 but clocks alone don't explain why 1 x 7742 appears to beat 2 x Graviton2. So when you calculate power consumption you have to include that it's really 2P Graviton2 or 2 x Graviton2 vs 1P 7742. So double the Graviton2 power usage figures.

But yes, we are seeing a future where ARM chips could start to compete but I just don't think right now it's something that they have achieved yet, and it doesn't really look like it's close for now.

There are just so many caveats to the performance published on the Graviton2.

Hitman928 · Mar 25, 2020

Nothingness said:
Can you (or someone else that doesn't have me on ignore ) double check your computation? Where did the 40% speed come from?

I think (no idea if I'm correct though) is that he's taking the Graviton2 numbers and comparing them against published Naples numbers which are significantly better than what Anandtech got for Naples (using a different compiler). Then taking that number and estimating Rome based upon its improvement over Naples.

	Anandtech Epyc 2x7601 (2.2 GHz base)	Verified SPEC Epyc 2x7551 (2.0 GHz base)	Verified SPEC vs Anandtech
400.perlbench	2020	2310.80	14.40%
401.bzip2	1280	1224.73	-4.32%
403.gcc	1400	1628.78	16.34%
429.mcf	837	2562.38	206.14%
445.gobmk	1780	1822.41	2.38%
456.hmmer	1700	3382.92	99.00%
458.sjeng	1820	1664.62	-8.54%
462.libquantum	1060	20305.65	1815.63%
464.h264ref	2680	2755.88	2.83%
471.omnetpp	705	1078.68	53.00%
473.astar	1080	1387.80	28.50%
483.xalancbmk	1240	2154.26	73.73%

CINT2006 Result: ASUSTeK Computer Inc. Asus RS700A-E9, AMD EPYC 7551 (test sponsored by Advanced Micro Devices)

CINT2006 result for Asus RS700A-E9, AMD EPYC 7551; base: 2050; peak: 2300

www.spec.org

DrMrLordX · Mar 25, 2020

Nothingness said:
I gave a link to a paper with data about WSS. What more do you want? Do you want me to redo the study?

Fiiiiine that'll have to do.

The paper seems to think that 80% of all Integer workloads have a working set size of at least 32Mb, but I can't find anything in the methodology where they differentiate between single-threaded and multi-threaded workloads.

Richie Rich · Mar 25, 2020

exquisitechar said:
Of course it won’t be lower, Graviton2 is Amazon’s in-house silicon.

If Rome system price and electricity bill would be low enough, I'm pretty sure Amazon would set price lower than Graviton2. Amazon needs to stay competitive against other cloud providers.
But everybody knows this is not gonna happened. Rome is expensive (cheaper than Intel though) and takes more power than ARM.

amrnuke said:
It's not just the clocks. 7742 clocks higher than Graviton2 but clocks alone don't explain why 1 x 7742 appears to beat 2 x Graviton2. So when you calculate power consumption you have to include that it's really 2P Graviton2 or 2 x Graviton2 vs 1P 7742. So double the Graviton2 power usage figures.
But yes, we are seeing a future where ARM chips could start to compete but I just don't think right now it's something that they have achieved yet, and it doesn't really look like it's close for now.
There are just so many caveats to the performance published on the Graviton2.

Personally I don't like A76 (3xALU+1xJump) based systems like Graviton2 and Altra because this core delivers 80% IPC of Zen2 only. Plus being crippled by tiny L3 cache doesn't help either.
A77 (4xALU+2xJump) is so much better (20-25% IPC) for cost of additional 17% transistors only. This is the real competitor for new Rome and Milan systems. Especially if they will pair A77 CPUs with 2 MB L3$ per core.
If Amazon and Ampere will jump directly to A78 for 2021 which is possible (A78 will be 40-45% IPC jump over A76) then even Milan/Zen3 will be in big trouble from performance point of view (for fraction of price as a bonus).

DrMrLordX · Mar 25, 2020

Richie Rich said:
Personally I don't like A76 (3xALU+1xJump) based systems like Graviton2 and Altra because this core delivers 80% IPC of Zen2 only.

How about we stop talking about what we like "personally" and start thinking about what server room admins, cloud players, and so forth need? Do you really think there's a boardroom meeting somewhere where a CTO stands up and says, "gee guys, we were thinking about that AWS instance using their in-house ARM CPU, but apparently it has only 80% the IPC of Rome so we aren't feeling good about that right now"?

There's a certain amount of vendor inertia keeping people on Intel systems and on software that will run on Intel systems. That segment of the market is already a non-starter for Graviton2. It's a very tough nut for anyone else to crack, including AMD. Putting that aside, I would expect the bean counters and server room guys to try to dial in the solution that's going to get them the best bang/buck while staying within various constraints (maximum power to the server room/cooling limits). Right now that looks like Rome, easily. A head-to-head comparison would clear things up nicely.

amrnuke · Mar 25, 2020

Nothingness said:
Can you (or someone else that doesn't have me on ignore ) double check your computation? Where did the 40% speed come from?

Sorry for not responding.
I just used the AT Graviton2 review, and the AT 7742 2P analysis divided by two.
However, I definitely made a huge error in Excel - somehow the formula didn't copy and it didn't divide by two. Should have double checked.
Graviton2 is 80% the speed of 1/2 of a 2P 7742 by my calculations.

	Graviton2	EPYC 7742 2P	EPYC 7742	G2 as % of 7742
	64 vCPU		1/2 of 2P

MT
400	1613	4820	2410	66.93%
401	924	3250	1625	56.86%
403	701	3540	1770	39.60%
429	597	1540	770	77.53%
445	1692	4170	2085	81.15%
456	2904	6480	3240	89.63%
458	1605	3900	1950	82.31%
462	725	1180	590	122.88%
464	2821	6400	3200	88.16%
471	574	1510	755	76.03%
473	806	1550	775	104.00%
483	1048	2870	1435	73.03%

				79.84%

ST	G2		7742
400	30.05		43.7	68.76%
401	19.21		27.2	70.63%
403	34.49		42.6	80.96%
429	29.4		39.6	74.24%
445	27.73		32.7	84.80%
456	48.5		60.5	80.17%
458	25.93		27.6	93.95%
462	93.79		72.3	129.72%
464	46.05		60.4	76.24%
471	23.19		23	100.83%
473	19.84		25.4	78.11%
483	32.21		47.8	67.38%

				83.82%

Richie Rich · Mar 26, 2020

amrnuke said:
However, I definitely made a huge error in Excel - somehow the formula didn't copy and it didn't divide by two. Should have double checked.
Graviton2 is 80% the speed of 1/2 of a 2P 7742 by my calculations.

That match 70-80% range I'd expect when comparing 64-core vs. 64-core. Average 83% is even a bit better despite small 32MB L3$.

Zen2/Rome has 128- threads...... so Graviton2's performance per thread is 1.6x higher (and A77 will provide 2.1x).
Zen1/Naples has 64 threads and providing 115% of performance .....so Graviton2's performance per thread is 1.7x higher.

Zen2 is 225W TDP ............. price 7500$
Zen1 is 2x180W TDP ......... price 2x3300$
Gra2 is 90W TDP .......... .... price 300$ ?

x86 is expesive to buy (CPU 20x, overall system about 2x?).
x86 is expesive to run (1.5-2.0x).
x86 offers lower performance per thread (1.6x, with SMT off x86 will loose 20% of MT performance).

Even Zen4 at 5nm cannot beat Gra2 from economical point of view nor performance per thread.
How x86 want to fight this?

DrMrLordX · Mar 26, 2020

Richie Rich said:
Gra2 is 90W TDP .......... .... price 300$ ?

Seriously? Where did you come up with that number?

NTMBK · Mar 26, 2020

Oh good, we're back to ALU counting.

There is a LOT more to performance than just execution resources.

Richie Rich · Mar 26, 2020

DrMrLordX said:
Seriously? Where did you come up with that number?

Radeon VII cost 550$ .... die size 331mm2 at 7nm TSMC.
- if we exclude margins and memory cost then the real price for silicon is about 200-300$ max.

Gra2 die size:

- CPU 64* 1.4mm2 = 90mm2 + 34mm2 L3$ ... that's 124mm2
- MEM ctrl, IO PCIe links.... (EPYC Rome IO die has 416mm2 on 14nm)... so 200mm2 at 7nm
- total ... 330 mm2 (similar to GPU Radeon VII)

Even if Graviton2 would be 500mm2 and price 800$ .... it's still dirt cheap in compare to any other x86 CPU.

DrMrLordX · Mar 26, 2020

Richie Rich said:
Radeon VII cost 550$ ....

STOP

Radeon VII is nothing but a Mi50 sold on the consumer market. Does a Mi50 cost $550? No. Rethink your "reasoning".

Gideon · Mar 26, 2020

DrMrLordX said:
Seriously? Where did you come up with that number?

Yeah, this is flat-out trolling with zero evidence. Sigh this is getting tedious.

And again with the ALU counts ...

This is the annotated die-shot of Zen 2 core (from this post). The SIMD units doubled from Zen 1 but ALU block remained about the same (they even added one AGU instead).

Boy aren't AMD chip-designers idiots look how much free space there is for ALUs! AMD should just replace everything with a huge ALU block, I bet they could fit a couple hundred in there (OK maybe save a little for SMT-16), my wouldn't that be a fast chip!!

/s

Richie Rich · Mar 26, 2020

DrMrLordX said:
STOP

Radeon VII is nothing but a Mi50 sold on the consumer market. Does a Mi50 cost $550? No. Rethink your "reasoning".

Do you think that AMD pay TSMC for Vega2 silicon 700$ and sells that for 550$? Do you think they are loosing money there? That's not possible, dumping price is illegal. STOP and rethink the difference between real silicon cost and final price at different markets (nV focus at computation because there are huge margins in compare to consumer market).

Gideon said:
Boy aren't AMD chip-designers idiots look how much free space there is for ALUs! AMD should just replace everything with a huge ALU block, I bet they could fit a couple hundred in there (OK maybe save a little for SMT-16), my wouldn't that be a fast chip!!

If you talk about hundreds of ALUs you have no clue what you talking about.
And yes, number of executing units in back end is one of the important metric. It always been no matter what ISA it was.
That's the reason why ARM is so strong. They moved from 2xALU A73 to much wider A76 design with 3xALU+1xJump pretty fast. And one year later A77 with 4xALU+2xJump (a lot of instruction througput was doubled in A77). That evolution is much faster than what we see in x86 world.

DrMrLordX · Mar 26, 2020

Richie Rich said:
Do you think that AMD pay TSMC for Vega2 GPU 700$ and sells that for 550$?

Do you think AMD pays TSMC $7500 (or more) for a Rome2 chip? Do you think Intel pays $20k to crank out Xeon Platinum systems that get wrecked by a 3990x? No! Amazon isn't even going to sell Graviton2 to anyone, and if they did, they'd slap on the same huge margins that ever other server hardware player expects before taking into account targeted discounts.

You can't say that Rome costs $7500 and then claim that Graviton2 is only a $300 CPU. That's completely insane.

Gideon · Mar 26, 2020

Richie Rich said:
If you talk about hundreds of ALUs you have no clue what you talking about.

Yes, this was heavy sarcasm directed at the "moar ALUs moar SMT" meme you keep bringing up in every single post.

Overall I agree, that ARM itself(and ARM custom designs) have been executing much better than x86 for a while (in terms of added performance per year).

It's also obvious for a while that clock-speeds won't really improve much with forthcoming nodes, yet core-area will shrink (adding hotspots). Therefore the only way to gain more single-threaded performance is to go with wider cores. Both of the x86 players need to do it (and they will).

This does not mean that "AMD has to add X ALUs to design Y by time Z!" Jamming more execution units into a design alone might very well make it worse. Just look at recent Samsung Exynos models. They are wider than Snapdragon, yet perform worse and use more power.

EDIT: And btw, I wouldn't mind at all if Zen 3 added an extra ALU (it would have to be coupled with extra AGUs as well), they would have to redesign the chip significantly for it to show any benefit at all, rather than just limiting clocks and burning more power. Or just waste die-space as was the case with Phenom:

AMD claims the third ALU/AGU pair went mostly unused in Phenom II, and as a result it's been removed from Bulldozer.

The Bulldozer Review: AMD FX-8150 Tested

www.anandtech.com

amrnuke · Mar 26, 2020

Richie Rich said:
Do you think that AMD pay TSMC for Vega2 silicon 700$ and sells that for 550$? Do you think they are loosing money there? That's not possible, dumping price is illegal. STOP and rethink the difference between real silicon cost and final price at different markets (nV focus at computation because there are huge margins in compare to consumer market).

Exactly, and Amazon will do the same for G2. The silicon itself may only cost $300 (but really, who knows?) but they have to pay the engineers who designed it, fuel further R&D. IMO they should price G2 instances just below x86 instances. To price it according to how much they themselves paid for the silicon would be silly, would be wasted profit potential.

Thunder 57 · Mar 26, 2020

@Richie Rich

Yup, x86 stole your lunch money, we get it. When Zen 3 comes out and is SMT2 only, will you admit you were wrong? When x86 is alive and well in a year or two, will you admit you were wrong?

Gideon · Mar 26, 2020

Thunder 57 said:
@Richie Rich

Yup, x86 stole your lunch money, we get it. When Zen 3 comes out and is SMT2 only, will you admit you were wrong? When x86 is alive and well in a year or two, will you admit you were wrong?

Exactly. I'm very much willing to believe that x86 is in trouble in long term. Nuvia seems to have the tech and marketing people to start converting hyperscalers in a couple of years etc ...

But it all takes time!

It took Intel almost a decade to take over the Data Center market, and another decade to conquer the HPC world. Even if ARM offers 2x perf/watt tomorrow it will only make dents in the market share for years (just look at the situation with Rome, which is a lot more like a drop-in replacement). Though as with most things, when ARM finally does take off, it will probably move much quicker than people expect. But even there will still be use-cases where x86 isn't replaced for a decade or so (llegacy codebases, etc)

Richie Rich · Mar 26, 2020

Gideon said:
Exactly. I'm very much willing to believe that x86 is in trouble in long term. Nuvia seems to have the tech and marketing people to start converting hyperscalers in a couple of years etc ...

But it all takes time!

It took Intel almost a decade to take over the Data Center market, and another decade to conquer the HPC world. Even if ARM offers 2x perf/watt tomorrow it will only make dents in the market share for years (just look at the situation with Rome, which is a lot more like a drop-in replacement). Though as with most things, when ARM finally does take off, it will probably move much quicker than people expect. But even there will still be use-cases where x86 isn't replaced for a decade or so (llegacy codebases, etc)

Nice pictures!
Regarding time: If you look carefully you will see x86 took 50% share (majority) in just 4 years. Both, data centers and HPCs. ARM is able to take majority in 4 years, maybe even faster because Amazon doesn't need to convince anybody in this long HW chain.

Amazon is the biggest cloud provider with 34% market share worldwide. It depends how long customers are willing to pay double the price for x86. Hard to predict this market dynamic but it will not take a decade. Next competitor MS Azure has 18% and Google cloud 8% and they have to stay on x86 for a while (with double price). Amazon can grow quickly up to 50% market share, let's say in about two years. MS and Google have to follow and buy Ampere Altra ARM systems to stay competetive in price point of view. It looks like ARM can take 50% market share in just two years. Internetless 90's were much slower in every way than today.

@Armnuke
What R&D do you mean? Graviton2 is based on licensed ARM's Neoverse architecture. Mem ctrl, IO and PCIe are licensed stuff. Assembling all those together is way easier and faster than AMD's complete R&D. ARMs flexible license cost 75.000$ per year, sure N1 will be more expensive, but server market needs 12 millions units per year. If Amazon aims for 10% market share then there will be 1.2 million G2 units per year. Even if Amazon pays 10 million for all licenses, it would cost just 10$ per CPU which is nothing.

DrMrLordX · Mar 26, 2020

amrnuke said:
IMO they should price G2 instances just below x86 instances.

Depends on what instances they want to support in the future. AWS is big enough that they can try to force customers onto their own in-house tech if it meets their future needs. As it stands they seem to have priced Graviton2 instances low, probably to cope with the relatively-higher demand for x86 as an ISA (moving to a Graviton2 instance incurs the costs of converting your codebase).

Richie Rich said:
It depends how long customers are willing to pay double the price for x86.

They aren't paying double the price for x86. From Anandtech's review of existing AWS instance costs per workload:

ARM News, Reviews and Insights | Tom's Hardware

Discover the Tom's Hardware take on the ARM product range, with news, reviews and benchmarking for the hardcore PC enthusiast.

www.anandtech.com

Note that Rome is not included. But without taking into the costs/difficulties of converting the codebase/switching to new software, Graviton2 instances are 40% cheaper on average. Not 50% cheaper. For real-world workloads, you might have to switch to an entirely different suite of software and/or rewrite stuff in-house to switch. Some FOSS makes the leap easily. In-house stuff has to be recompiled, at least. Anything using advanced SIMD instructions needs a rewrite. Plenty of other software would need a port. And remember, Rome is not included. Nor is IceLake-SP (gee, wonder why).

RetroZombie · Mar 26, 2020

According to this guys:
How Intel Stole the x86 Market
Top epyc costs 400$ to make
Intel four core version 55$
Intel High court count dice 135$ to 165$
Intel Extreme core count dice 178$ to 200$

However i don't know how they calculate the price.
Is it just tsmc manufacturing cost?
Does include R&D?
Amortization over time?
Manufacturing costs go down over time?

And how do you guys are calculating prices?

Lets me just go with some software cost example:
- Microsoft for office no matter if it's just released or about to be discontinued always costs the same, and it doesn't matter if microsoft sells 100M copies or 1000B copies, price is fixed.
-Games, just released AAA game 70$, a few months latter 50$, about 1/2 years 20$, after 3/5 years 5$.

Hitman928 · Mar 26, 2020

RetroZombie said:
According to this guys:
How Intel Stole the x86 Market
Top epyc costs 400$ to make
Intel four core version 55$
Intel High court count dice 135$ to 165$
Intel Extreme core count dice 178$ to 200$

However i don't know how they calculate the price.
Is it just tsmc manufacturing cost?
Does include R&D?
Amortization over time?
Manufacturing costs go down over time?

And how do you guys are calculating prices?

Lets me just go with some software cost example:
- Microsoft for office no matter if it's just released or about to be discontinued always costs the same, and it doesn't matter if microsoft sells 100M copies or 1000B copies, price is fixed.
-Games, just released AAA game 70$, a few months latter 50$, about 1/2 years 20$, after 3/5 years 5$.

I'm not going to listen to this for over an hour, can you point out the relevant timestamp? I got a couple of minutes in and his cost analysis was fairly. . . pedestrian.

Markfw · Mar 26, 2020

Hitman928 said:
I'm not going to listen to this for over an hour, can you point out the relevant timestamp? I got a couple of minutes in and his cost analysis was fairly. . . pedestrian.

I agree. You can't compare cost@22nm and 5 nm (not that he used those exactly). The point is, just for the mfg part, the cost goes up exponentially for each node. He is saying things like "take the retail, divide by 2 and thats the high discounted price", etc... No real numbers.

Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Senior member

Diamond Member

Senior member

Golden Member

Diamond Member

Lifer

Senior member

Lifer

Golden Member

Senior member

Lifer

Lifer

Senior member

Lifer

Platinum Member

Senior member

Lifer

Platinum Member

Golden Member

Diamond Member

Platinum Member

Senior member

Lifer

Senior member

Diamond Member

Moderator Emeritus, Elite Member