Discussion AWS Graviton2 64 vCPU Arm CPU Heightens War of Intel Betrayal

Page 6 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.
Status
Not open for further replies.

Thunder 57

Platinum Member
Aug 19, 2007
2,647
3,706
136
Lets see what price will Amazon set for Rome system. I would be surprised if it will be lower than Graviton2. IMHO cloud systems will dominate ARM soon or later. However X86 will still win where it can use higher clock >4GHz like HPC systems. But only until Nuvia CPUs arrive. That's pretty dark future for x86.

Of course it will cost less than Rome, it's in Amazon's best interest to sell Graviton even if it is at lower margins (in the short term anyway). Also, there's a big if Nuvia succeeds. Others have tried, and that's without the entire legal team of Apple on their backs.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
250% => 1/2.5 = 40% .... recalculated to different base (Zen2)


I like your calculation a lot. It makes sense that Rome will be tough competitor for Graviton2. But there are two points:
1) Graviton2 die size is about 300mm2 ... that's price of 7nm GPU Vega2 .... Amazon's cost per Graviton2 CPU is about 200-300$? That's the magic of buying ARM license and making your own CPU.
2) Electricity cost is also important. Especialy when your CPU price is super low then most part of cost is electricity bill.

Lets see what price will Amazon set for Rome system. I would be surprised if it will be lower than Graviton2. IMHO cloud systems will dominate ARM soon or later. However X86 will still win where it can use higher clock >4GHz like HPC systems. But only until Nuvia CPUs arrive. That's pretty dark future for x86.
It's not just the clocks. 7742 clocks higher than Graviton2 but clocks alone don't explain why 1 x 7742 appears to beat 2 x Graviton2. So when you calculate power consumption you have to include that it's really 2P Graviton2 or 2 x Graviton2 vs 1P 7742. So double the Graviton2 power usage figures.

But yes, we are seeing a future where ARM chips could start to compete but I just don't think right now it's something that they have achieved yet, and it doesn't really look like it's close for now.

There are just so many caveats to the performance published on the Graviton2.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
Can you (or someone else that doesn't have me on ignore :D) double check your computation? Where did the 40% speed come from?

I think (no idea if I'm correct though) is that he's taking the Graviton2 numbers and comparing them against published Naples numbers which are significantly better than what Anandtech got for Naples (using a different compiler). Then taking that number and estimating Rome based upon its improvement over Naples.

Anandtech
Epyc 2x7601
(2.2 GHz base)
Verified SPEC Epyc 2x7551
(2.0 GHz base)
Verified SPEC
vs Anandtech
400.perlbench2020
2310.80​
14.40%​
401.bzip21280
1224.73​
-4.32%​
403.gcc1400
1628.78​
16.34%​
429.mcf837
2562.38​
206.14%​
445.gobmk1780
1822.41​
2.38%​
456.hmmer1700
3382.92​
99.00%​
458.sjeng1820
1664.62​
-8.54%​
462.libquantum1060
20305.65​
1815.63%​
464.h264ref2680
2755.88​
2.83%​
471.omnetpp705
1078.68​
53.00%​
473.astar1080
1387.80​
28.50%​
483.xalancbmk1240
2154.26​
73.73%​

 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
I gave a link to a paper with data about WSS. What more do you want? Do you want me to redo the study? :)

Fiiiiine that'll have to do.

The paper seems to think that 80% of all Integer workloads have a working set size of at least 32Mb, but I can't find anything in the methodology where they differentiate between single-threaded and multi-threaded workloads.
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Of course it won’t be lower, Graviton2 is Amazon’s in-house silicon.
If Rome system price and electricity bill would be low enough, I'm pretty sure Amazon would set price lower than Graviton2. Amazon needs to stay competitive against other cloud providers.
But everybody knows this is not gonna happened. Rome is expensive (cheaper than Intel though) and takes more power than ARM.




It's not just the clocks. 7742 clocks higher than Graviton2 but clocks alone don't explain why 1 x 7742 appears to beat 2 x Graviton2. So when you calculate power consumption you have to include that it's really 2P Graviton2 or 2 x Graviton2 vs 1P 7742. So double the Graviton2 power usage figures.
But yes, we are seeing a future where ARM chips could start to compete but I just don't think right now it's something that they have achieved yet, and it doesn't really look like it's close for now.
There are just so many caveats to the performance published on the Graviton2.
Personally I don't like A76 (3xALU+1xJump) based systems like Graviton2 and Altra because this core delivers 80% IPC of Zen2 only. Plus being crippled by tiny L3 cache doesn't help either.
A77 (4xALU+2xJump) is so much better (20-25% IPC) for cost of additional 17% transistors only. This is the real competitor for new Rome and Milan systems. Especially if they will pair A77 CPUs with 2 MB L3$ per core.
If Amazon and Ampere will jump directly to A78 for 2021 which is possible (A78 will be 40-45% IPC jump over A76) then even Milan/Zen3 will be in big trouble from performance point of view (for fraction of price as a bonus).
 

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Personally I don't like A76 (3xALU+1xJump) based systems like Graviton2 and Altra because this core delivers 80% IPC of Zen2 only.

How about we stop talking about what we like "personally" and start thinking about what server room admins, cloud players, and so forth need? Do you really think there's a boardroom meeting somewhere where a CTO stands up and says, "gee guys, we were thinking about that AWS instance using their in-house ARM CPU, but apparently it has only 80% the IPC of Rome so we aren't feeling good about that right now"?

There's a certain amount of vendor inertia keeping people on Intel systems and on software that will run on Intel systems. That segment of the market is already a non-starter for Graviton2. It's a very tough nut for anyone else to crack, including AMD. Putting that aside, I would expect the bean counters and server room guys to try to dial in the solution that's going to get them the best bang/buck while staying within various constraints (maximum power to the server room/cooling limits). Right now that looks like Rome, easily. A head-to-head comparison would clear things up nicely.
 

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Can you (or someone else that doesn't have me on ignore :D) double check your computation? Where did the 40% speed come from?
Sorry for not responding.
I just used the AT Graviton2 review, and the AT 7742 2P analysis divided by two.
However, I definitely made a huge error in Excel - somehow the formula didn't copy and it didn't divide by two. Should have double checked.
Graviton2 is 80% the speed of 1/2 of a 2P 7742 by my calculations.

Graviton2EPYC 7742 2PEPYC 7742G2 as % of 7742
64 vCPU1/2 of 2P
MT
400​
1613​
4820​
2410​
66.93%​
401​
924​
3250​
1625​
56.86%​
403​
701​
3540​
1770​
39.60%​
429​
597​
1540​
770​
77.53%​
445​
1692​
4170​
2085​
81.15%​
456​
2904​
6480​
3240​
89.63%​
458​
1605​
3900​
1950​
82.31%​
462​
725​
1180​
590​
122.88%​
464​
2821​
6400​
3200​
88.16%​
471​
574​
1510​
755​
76.03%​
473​
806​
1550​
775​
104.00%​
483​
1048​
2870​
1435​
73.03%​
79.84%​
STG27742
400​
30.05​
43.7​
68.76%​
401​
19.21​
27.2​
70.63%​
403​
34.49​
42.6​
80.96%​
429​
29.4​
39.6​
74.24%​
445​
27.73​
32.7​
84.80%​
456​
48.5​
60.5​
80.17%​
458​
25.93​
27.6​
93.95%​
462​
93.79​
72.3​
129.72%​
464​
46.05​
60.4​
76.24%​
471​
23.19​
23​
100.83%​
473​
19.84​
25.4​
78.11%​
483​
32.21​
47.8​
67.38%​
83.82%​
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
However, I definitely made a huge error in Excel - somehow the formula didn't copy and it didn't divide by two. Should have double checked.
Graviton2 is 80% the speed of 1/2 of a 2P 7742 by my calculations.
That match 70-80% range I'd expect when comparing 64-core vs. 64-core. Average 83% is even a bit better despite small 32MB L3$.

Zen2/Rome has 128- threads...... so Graviton2's performance per thread is 1.6x higher (and A77 will provide 2.1x).
Zen1/Naples has 64 threads and providing 115% of performance .....so Graviton2's performance per thread is 1.7x higher.

  • Zen2 is 225W TDP ............. price 7500$
  • Zen1 is 2x180W TDP ......... price 2x3300$
  • Gra2 is 90W TDP .......... .... price 300$ ?

x86 is expesive to buy (CPU 20x, overall system about 2x?).
x86 is expesive to run (1.5-2.0x).
x86 offers lower performance per thread (1.6x, with SMT off x86 will loose 20% of MT performance).

Even Zen4 at 5nm cannot beat Gra2 from economical point of view nor performance per thread.
How x86 want to fight this?
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Seriously? Where did you come up with that number?
Radeon VII cost 550$ .... die size 331mm2 at 7nm TSMC.
- if we exclude margins and memory cost then the real price for silicon is about 200-300$ max.

Gra2 die size:
  • - CPU 64* 1.4mm2 = 90mm2 + 34mm2 L3$ ... that's 124mm2
  • - MEM ctrl, IO PCIe links.... (EPYC Rome IO die has 416mm2 on 14nm)... so 200mm2 at 7nm
  • - total ... 330 mm2 (similar to GPU Radeon VII)


Even if Graviton2 would be 500mm2 and price 800$ .... it's still dirt cheap in compare to any other x86 CPU.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,570
136
Seriously? Where did you come up with that number?
Yeah, this is flat-out trolling with zero evidence. Sigh this is getting tedious.

And again with the ALU counts ...

This is the annotated die-shot of Zen 2 core (from this post). The SIMD units doubled from Zen 1 but ALU block remained about the same (they even added one AGU instead).

Boy aren't AMD chip-designers idiots look how much free space there is for ALUs! AMD should just replace everything with a huge ALU block, I bet they could fit a couple hundred in there (OK maybe save a little for SMT-16), my wouldn't that be a fast chip!!

/s

5017e5_982e0e47d7c04dctkcg.jpg
 

Richie Rich

Senior member
Jul 28, 2019
470
229
76
STOP

Radeon VII is nothing but a Mi50 sold on the consumer market. Does a Mi50 cost $550? No. Rethink your "reasoning".
Do you think that AMD pay TSMC for Vega2 silicon 700$ and sells that for 550$? Do you think they are loosing money there? That's not possible, dumping price is illegal. STOP and rethink the difference between real silicon cost and final price at different markets (nV focus at computation because there are huge margins in compare to consumer market).


Boy aren't AMD chip-designers idiots look how much free space there is for ALUs! AMD should just replace everything with a huge ALU block, I bet they could fit a couple hundred in there (OK maybe save a little for SMT-16), my wouldn't that be a fast chip!!
If you talk about hundreds of ALUs you have no clue what you talking about.
And yes, number of executing units in back end is one of the important metric. It always been no matter what ISA it was.
That's the reason why ARM is so strong. They moved from 2xALU A73 to much wider A76 design with 3xALU+1xJump pretty fast. And one year later A77 with 4xALU+2xJump (a lot of instruction througput was doubled in A77). That evolution is much faster than what we see in x86 world.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
Do you think that AMD pay TSMC for Vega2 GPU 700$ and sells that for 550$?

Do you think AMD pays TSMC $7500 (or more) for a Rome2 chip? Do you think Intel pays $20k to crank out Xeon Platinum systems that get wrecked by a 3990x? No! Amazon isn't even going to sell Graviton2 to anyone, and if they did, they'd slap on the same huge margins that ever other server hardware player expects before taking into account targeted discounts.

You can't say that Rome costs $7500 and then claim that Graviton2 is only a $300 CPU. That's completely insane.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,570
136
If you talk about hundreds of ALUs you have no clue what you talking about.
Yes, this was heavy sarcasm directed at the "moar ALUs moar SMT" meme you keep bringing up in every single post.

Overall I agree, that ARM itself(and ARM custom designs) have been executing much better than x86 for a while (in terms of added performance per year).

It's also obvious for a while that clock-speeds won't really improve much with forthcoming nodes, yet core-area will shrink (adding hotspots). Therefore the only way to gain more single-threaded performance is to go with wider cores. Both of the x86 players need to do it (and they will).

This does not mean that "AMD has to add X ALUs to design Y by time Z!" Jamming more execution units into a design alone might very well make it worse. Just look at recent Samsung Exynos models. They are wider than Snapdragon, yet perform worse and use more power.

EDIT: And btw, I wouldn't mind at all if Zen 3 added an extra ALU (it would have to be coupled with extra AGUs as well), they would have to redesign the chip significantly for it to show any benefit at all, rather than just limiting clocks and burning more power. Or just waste die-space as was the case with Phenom:

AMD claims the third ALU/AGU pair went mostly unused in Phenom II, and as a result it's been removed from Bulldozer.
 
Last edited:

amrnuke

Golden Member
Apr 24, 2019
1,181
1,772
136
Do you think that AMD pay TSMC for Vega2 silicon 700$ and sells that for 550$? Do you think they are loosing money there? That's not possible, dumping price is illegal. STOP and rethink the difference between real silicon cost and final price at different markets (nV focus at computation because there are huge margins in compare to consumer market).
Exactly, and Amazon will do the same for G2. The silicon itself may only cost $300 (but really, who knows?) but they have to pay the engineers who designed it, fuel further R&D. IMO they should price G2 instances just below x86 instances. To price it according to how much they themselves paid for the silicon would be silly, would be wasted profit potential.
 

Gideon

Golden Member
Nov 27, 2007
1,608
3,570
136
@Richie Rich

Yup, x86 stole your lunch money, we get it. When Zen 3 comes out and is SMT2 only, will you admit you were wrong? When x86 is alive and well in a year or two, will you admit you were wrong?
Exactly. I'm very much willing to believe that x86 is in trouble in long term. Nuvia seems to have the tech and marketing people to start converting hyperscalers in a couple of years etc ...

But it all takes time!

It took Intel almost a decade to take over the Data Center market, and another decade to conquer the HPC world. Even if ARM offers 2x perf/watt tomorrow it will only make dents in the market share for years (just look at the situation with Rome, which is a lot more like a drop-in replacement). Though as with most things, when ARM finally does take off, it will probably move much quicker than people expect. But even there will still be use-cases where x86 isn't replaced for a decade or so (llegacy codebases, etc)

Intel-Business.jpg
 
  • Like
Reactions: lightmanek

Richie Rich

Senior member
Jul 28, 2019
470
229
76
Exactly. I'm very much willing to believe that x86 is in trouble in long term. Nuvia seems to have the tech and marketing people to start converting hyperscalers in a couple of years etc ...

But it all takes time!

It took Intel almost a decade to take over the Data Center market, and another decade to conquer the HPC world. Even if ARM offers 2x perf/watt tomorrow it will only make dents in the market share for years (just look at the situation with Rome, which is a lot more like a drop-in replacement). Though as with most things, when ARM finally does take off, it will probably move much quicker than people expect. But even there will still be use-cases where x86 isn't replaced for a decade or so (llegacy codebases, etc)

Intel-Business.jpg
Nice pictures!
Regarding time: If you look carefully you will see x86 took 50% share (majority) in just 4 years. Both, data centers and HPCs. ARM is able to take majority in 4 years, maybe even faster because Amazon doesn't need to convince anybody in this long HW chain.

Amazon is the biggest cloud provider with 34% market share worldwide. It depends how long customers are willing to pay double the price for x86. Hard to predict this market dynamic but it will not take a decade. Next competitor MS Azure has 18% and Google cloud 8% and they have to stay on x86 for a while (with double price). Amazon can grow quickly up to 50% market share, let's say in about two years. MS and Google have to follow and buy Ampere Altra ARM systems to stay competetive in price point of view. It looks like ARM can take 50% market share in just two years. Internetless 90's were much slower in every way than today.

@Armnuke
What R&D do you mean? Graviton2 is based on licensed ARM's Neoverse architecture. Mem ctrl, IO and PCIe are licensed stuff. Assembling all those together is way easier and faster than AMD's complete R&D. ARMs flexible license cost 75.000$ per year, sure N1 will be more expensive, but server market needs 12 millions units per year. If Amazon aims for 10% market share then there will be 1.2 million G2 units per year. Even if Amazon pays 10 million for all licenses, it would cost just 10$ per CPU which is nothing.
 
Last edited:

DrMrLordX

Lifer
Apr 27, 2000
21,582
10,785
136
IMO they should price G2 instances just below x86 instances.

Depends on what instances they want to support in the future. AWS is big enough that they can try to force customers onto their own in-house tech if it meets their future needs. As it stands they seem to have priced Graviton2 instances low, probably to cope with the relatively-higher demand for x86 as an ISA (moving to a Graviton2 instance incurs the costs of converting your codebase).

It depends how long customers are willing to pay double the price for x86.

They aren't paying double the price for x86. From Anandtech's review of existing AWS instance costs per workload:


Note that Rome is not included. But without taking into the costs/difficulties of converting the codebase/switching to new software, Graviton2 instances are 40% cheaper on average. Not 50% cheaper. For real-world workloads, you might have to switch to an entirely different suite of software and/or rewrite stuff in-house to switch. Some FOSS makes the leap easily. In-house stuff has to be recompiled, at least. Anything using advanced SIMD instructions needs a rewrite. Plenty of other software would need a port. And remember, Rome is not included. Nor is IceLake-SP (gee, wonder why).
 

RetroZombie

Senior member
Nov 5, 2019
464
386
96
According to this guys:
How Intel Stole the x86 Market
Top epyc costs 400$ to make
Intel four core version 55$
Intel High court count dice 135$ to 165$
Intel Extreme core count dice 178$ to 200$

However i don't know how they calculate the price.
Is it just tsmc manufacturing cost?
Does include R&D?
Amortization over time?
Manufacturing costs go down over time?

And how do you guys are calculating prices?

Lets me just go with some software cost example:
- Microsoft for office no matter if it's just released or about to be discontinued always costs the same, and it doesn't matter if microsoft sells 100M copies or 1000B copies, price is fixed.
-Games, just released AAA game 70$, a few months latter 50$, about 1/2 years 20$, after 3/5 years 5$.
 

Hitman928

Diamond Member
Apr 15, 2012
5,177
7,628
136
According to this guys:
How Intel Stole the x86 Market
Top epyc costs 400$ to make
Intel four core version 55$
Intel High court count dice 135$ to 165$
Intel Extreme core count dice 178$ to 200$

However i don't know how they calculate the price.
Is it just tsmc manufacturing cost?
Does include R&D?
Amortization over time?
Manufacturing costs go down over time?

And how do you guys are calculating prices?

Lets me just go with some software cost example:
- Microsoft for office no matter if it's just released or about to be discontinued always costs the same, and it doesn't matter if microsoft sells 100M copies or 1000B copies, price is fixed.
-Games, just released AAA game 70$, a few months latter 50$, about 1/2 years 20$, after 3/5 years 5$.

I'm not going to listen to this for over an hour, can you point out the relevant timestamp? I got a couple of minutes in and his cost analysis was fairly. . . pedestrian.
 

Markfw

Moderator Emeritus, Elite Member
May 16, 2002
25,478
14,434
136
I'm not going to listen to this for over an hour, can you point out the relevant timestamp? I got a couple of minutes in and his cost analysis was fairly. . . pedestrian.
I agree. You can't compare cost@22nm and 5 nm (not that he used those exactly). The point is, just for the mfg part, the cost goes up exponentially for each node. He is saying things like "take the retail, divide by 2 and thats the high discounted price", etc... No real numbers.
 
Status
Not open for further replies.