News ARM Server CPUs

Gideon · Mar 16, 2020

With all the upcoming ARM servers (Let's not forget Nuvia, etc) it probably makes sense to have 1 thread from them all, instead of creating new ones for each announcement (if Not, I will rename it).

However:
Anandtech: Marwell Announces 3rd Gen Arm Server Thunder X3: 96 Cores/384 threads
ServeTheHome: Marvell ThunderX3 Arm Server CPU with 768 Threads in 2020

Could be pretty impressive, though 25% Single Threaded performance gain seems a bit meh, compared to X2, which @2.5Ghz was ~50% slower than Xeon @ 3.8Ghz

But Their marketing slides sure show potential:

NTMBK · Apr 28, 2021

ThatBuzzkiller said:
ARM may have a more accessible ISA license but it doesn't get anymore contributions outside of the licensing fees. There is no path for ARM to universally succeed since there are self-absorbed corporations like Apple who don't care about participating in industry standards with their most recent 5G debacle as an indication and Nvidia does not share their technology if ever. If other ARM vendors did care about ARM architecture in general then we'd have Apple licensing their CPU designs for others and Nvidia licensing their GPUs out to other ARM vendors at reasonable rates ... (anyone can tell that neither Apple or Nvidia has many friends in the industry)

Instead what we have is ARM vendors like Samsung cooperating (?!) with x86 vendors such as AMD and Intel opening up their fab services to other ARM vendors as well which suggest that either x86 vendors are more trustworthy in general or ARM vendors have very bitter relationships towards each other ...

ARM license terms forbid customers from licensing their custom CPU designs out. They don't want competition for their own IP licensing business.

NVidia did try an "IP licensing" scheme a few years ago, but nobody took them up on it... probably because NVidia have a tendency of pissing off their partners.

ThatBuzzkiller · Apr 28, 2021

Gideon said:
1. The console vendors have always broken backwards compatiblity when it offered decent performance gains

Not anymore, at least in Microsoft or Sony's case. They are determined to stay on AMD CPU/GPUs for future console platforms. Maintaining backwards compatibility from a hardware standpoint also has positive benefits for complex projects like AAA games since publishers will be able to pull off faster release schedules for new hardware ...

Gideon said:
2. By next gen they should be easily able to run current gen games under emulation (especially if it's an AMD's ARM CPU with a tightly integrated FPGA on board for exotic instructions).

CPU emulation isn't going to be the main problem. It's the GPU emulation that's going to be the issue with modern consoles depending on how many scary hardware features games are going to be using ...

Have you even looked at what AMD GPUs are capable of in Vulkan ? None of the other ARM vendor's GPUs even come close and there's far more low-level access to be had with console gfx APIs like GNM or Xbox D3D ... (even Nvidia GPUs aren't a great fit for emulating AMD GPUs when we look at projects like VKD3D which are practically cloning AMD's D3D12 drivers)

Gideon said:
I mean, just take a look at this video.
This is Metro Exodus (2019):

Running x86 code under rosetta 2 emulation

On a fanless Arm M1 laptop with an integrated GPU

at 1080p with Medium settings at around 30FPS (the entire video with both indoor and outdoor scenes)

*snip*

Show me one x86 laptop that can do the same with integrated graphics, especially in a fanless chassis?

96EU Intel Iris Xe in the latest tiger lake can only get 30FPS at 720p with Low settings (see this video) and it will run at ~15-20FPS on the same settings on 28W laptops, confirmed by notebookcheck's results:

*snip*

It only manages to do 15FPS (on an asus Zenbook 14 ) and 20 FPS (on an Acer Swift 5) both running with a 28W TDP (that shortly draws up to 58W with 100% CPU and GPU load) . A far cry from a fanless.

Well first of all we aren't comparing like for like scene/area and second of all Metro Exodus is just one sample so you're going to have to give more data than this. Lastly, the M1 also has a process advantage so I wouldn't be surprised if the M1's GPU came out ahead down to this fact ...

Gideon said:
Therefore a decent ARM CPU 3-5 years from now shoud run PS5 x86 games under emulation. Yes It requires hardware investment (like Apple has memory synchronized memory instructions pretty much only for x86 emulation). And It might be only the most popular games requiring patches, but it can.

ARM custom core licences don't allow licencing their cores onwards. Only designing cores for their own products.

Again, it's GPU emulation that's going to be the roadblock as I pointed up in the above ...

As for licensing, there's a relatively simple legal fix. Why doesn't Apple broker another agreement with ARM Ltd again so that they can share their CPU designs for other ARM vendors ? A legal excuse is just an easy way out for Apple to not share their technology ...

Thala · Apr 28, 2021

ThatBuzzkiller said:
CPU emulation isn't going to be the main problem. It's the GPU emulation that's going to be the issue with modern consoles depending on how many scary hardware features games are going to be using ...

Get real! GPU will be a non issue. You only have to support a superset of features on API level - and we talking about GPUs not existing today. I am sure that todays more exotic (API level) features are pretty commonly supported in 5 years from now.

Funny, that you mentioned Radeon RX 6700 XT VK features - because NVidia Tegra X2 VK features are clearly a superset, despite being an ARM SoC.

NTMBK · Apr 28, 2021

ThatBuzzkiller said:
As for licensing, there's a relatively simple legal fix. Why doesn't Apple broker another agreement with ARM Ltd again so that they can share their CPU designs for other ARM vendors ? A legal excuse is just an easy way out for Apple to not share their technology ...

Why the hell would ARM agree to that? They want customers buying their Neoverse IP, they don't want to have to compete with Apple's CPU designs.

DrMrLordX · Apr 28, 2021

I would like to point out that while Andrei's excellent article highlights how ARM is attacking the HPC and datacentre worlds in full force, it really says nothing about the desktop/workstation scene. So to the extent that anyone is bringing that into play . . . it just doesn't fit here, at least not based on the topic and not on the V1 and N2 announcements.

If you want to see credible commodity efforts towards ARM desktop/workstation, I would expect it from Qualcomm + Nuvia. There's always Apple but, you know, walled garden. To an extent they may as well not exist.

Gideon · Apr 28, 2021

ThatBuzzkiller said:
Not anymore, at least in Microsoft or Sony's case. They are determined to stay on AMD CPU/GPUs for future console platforms. Maintaining backwards compatibility from a hardware standpoint also has positive benefits for complex projects like AAA games since publishers will be able to pull off faster release schedules for new hardware ...

We shall wait and see. If the difference in 5 years is anywhere near what Intel vs M1 currently is, It's all but guaranteed to change.

Past decisions have not been very good indications of future ones for console before.

I remember people being hostile on this very forum when I mentioned sometime in 2013 that this generation we'll probably see something akin to PS4 Pro as it's x86 and PS4 will be too weak in ~5 years (and much easier to upgrade). At least some posters got angry for posting such a "stupid idea" , "everybody knows console perf is fixed for the entire generation" (always had been!).

ThatBuzzkiller said:
CPU emulation isn't going to be the main problem. It's the GPU emulation that's going to be the issue with modern consoles depending on how many scary hardware features games are going to be using ...

Not a problem if it uses AMDs GPU + ARM cores.

ThatBuzzkiller said:
Well first of all we aren't comparing like for like scene/area and second of all Metro Exodus is just one sample so you're going to have to give more data than this. Lastly, the M1 also has a process advantage so I wouldn't be surprised if the M1's GPU came out ahead down to this fact ...

You are grasping at straws here. No way it's the CPU:
1. It's running in x86 emulation mode, guaranteed to be a 20-30% hit at least
2. It's running against Tiger Lake (which is faster in geekbench ST vs M1 using Rosetta)
3. On top of that it's using MoltenVK to convert Vulcan calls to Metal adding another layer of abstraction (and slowdowns)

And It's not because of different scenes either. I posted a 20 minute M1 gameplay video of the Moscow levels having both indoor and outdoor scenes, with the framerate hovering around 30FPS near constantly. The framerate in these levels is quite stable in these areas (meaining the scene doesn't matter that much, know as I have played it. In at least two youtube videos Xe is only able to hit 30FPS at 720p Low (and this is backed by notebookcheck) in scenes taking place in the first few levels.

And the process excuse is also getting tiresome. Apple's A13 was still in the same performance ballpark A14 is, and it was made on TSMC 7nm (perf was down -20% mostly due to clocks, that might not have been an issue in a laptop). And yes, Apple's GPU benefits more from 5nm, but Tiger Lake is pulling 28 - 50W in challenging games. Air is on passive cooling and has at least 1/3 better performance despite emulation. This is way more than one shrink alone would do.

ThatBuzzkiller said:
As for licensing, there's a relatively simple legal fix. Why doesn't Apple broker another agreement with ARM Ltd again so that they can share their CPU designs for other ARM vendors ? A legal excuse is just an easy way out for Apple to not share their technology ...

And why should ARM be interested in changing the licence just to lose a huge chunk of their core licencing income (as Apple's cores are better)?
Even if they were, it would mean that the licence would be that much more expensive for Apple, why should they want it? Just to generate more work for them making their designs licencable and selling them?

ThatBuzzkiller · Apr 28, 2021

Thala said:
Get real! GPU will be a non issue. You only have to support a superset of features on API level - and we talking about GPUs not existing today. I am sure that todays more exotic (API level) features are pretty commonly supported in 5 years from now.

Maybe you should get real if you actually looked at the supported optional Vulkan features on other GPU vendors ?

Most mobile GPU designs are still years behind AMD in terms of supported features. There's this one really technologically exotic game such as Dreams on the PS4 which relies on a really obscure mechanism known as ordered append for doing point cloud rendering in that game which is NOT available on any public gfx API because no GPUs aside from AMD supports that feature!

There's this other really scary feature on RDNA2 where you can run vertex/tess/geometry/mesh programs on their primitive shader. Nvidia has two geometry pipelines since they can't run vertex/tess/geometry programs on their mesh shading pipeline so they can't properly emulate primitive shaders either ...

NTMBK said:
Why the hell would ARM agree to that? They want customers buying their Neoverse IP, they don't want to have to compete with Apple's CPU designs.

I don't mean Apple directly competing against ARM Ltd if that's what you're thinking. I mean that ARM Ltd should get a cut too if Apple does want to share their CPU designs but I don't imagine that Apple wants to share profit either ...

jpiniero · Apr 28, 2021

Given that MS is rumored to be developing their own server CPU, a future console using MS's developed cores shouldn't be that unusual. What GPU they would use I have no clue.

And they do have a long term BC strategy - xCloud.

Gideon · Apr 28, 2021

jpiniero said:
Given that MS is rumored to be developing their own server CPU, a future console using MS's developed cores shouldn't be that unusual. What GPU they would use I have no clue.

And they do have a long term BC strategy - xCloud.

Yeah between all the other things differenciation is another thing why I see this happening. XBox vs PS one has a much bigger chance to be differ if they beat the oter to a new/better microarchitecture.

I consider myself a longtime AMD fan (and Desktop PC performance tuning) so I'ts not like I want this to happen, far from it. It's just that several writings are on the wall that people are ignoring because they'd like the status-quo to continue.

DrMrLordX · Apr 28, 2021

jpiniero said:
Given that MS is rumored to be developing their own server CPU

That seems a certainty at this point. How long they stick with Neoverse-derived tech remains to be seen, but they definitely want in on the same action as Amazon and Google.

a future console using MS's developed cores shouldn't be that unusual. What GPU they would use I have no clue.

Nobody capable of selling them GPU tech would do so on favorable terms unless MS agreed to just buy the entire platform. Unless MS wants to try Imagination. If they do, God help them.

moinmoin · Apr 28, 2021

Gideon said:
Anandtech has an excellent article about Neoverse V1:

This slide definitely shows better than most others that AMD and Intel need a solid long-term plan to counter ARM (It includes Q1 2021 btw).

1. Hyperscalers are > 50% of the entire enterprice market, and they are integrating vertically into ARM surprisingly fast, you can't just ignore it.
2. Apple is gone (for x86) and Qualcomm (and Samsung it seems) are coming to laptops next year.
3. Nvidia is pushing ARM gaming, it's still in babysteps but in ~2023 timeframe they might have some solid offerings around.
4. previous point means that consoles actually have a decent chance of moving to ARM as well.

I'm not sayng all of the above will 100% pan out, but IMO just trying to out-execute competition by improving x86 processors doesn't seem to be enough in the longer term (5 - 10 year timeline).
I wonder what is? Start investing in RISC-V infrastructure looks like an avenue.

AMD and Intel definitely need to take ARM serious as a competitor, so both need to improve much in power efficiency. I think AMD's current silence owes to that. Imo Intel is who we need to worry about.

That said I'd not take AWS' Graviton stats as an indication that ARM is going to take the datacenters by storm in general now. This is currently very isolated to AWS with its custom Graviton servers. What we are currently seeing is the big hyperscaler boys all doing their own ARM projects. That gives each of them a potential competitive advantage. But the bad thing for the ARM ecosystem overall is that this fragmentation prevents the economy of scale from happening in the ARM space. ARM is the toy of the day for these hyperscalers, and in the worst case they'll drop it again at some later point, with the rest of the ARM ecosystem never having profited from the increase of market share as the open market was completely locked out.

Best case for ARM is that with hyperscalers being early adopters there will also be more demand and more offers on the open market. And that is the real danger AMD and Intel face. But that's not what's happening just yet.

Doug S · Apr 28, 2021

Gideon said:
Yes not right now, but during the next cycle in ~5 years they very well might.

IMHO this is a weak argument because:

1. The console vendors have always broken backwards compatiblity when it offered decent performance gains.
2. By next gen they should be easily able to run current gen games under emulation (especially if it's an AMD's ARM CPU with a tightly integrated FPGA on board for exotic instructions).

I mean, just take a look at this video.
This is Metro Exodus (2019):

Running x86 code under rosetta 2 emulation

On a fanless Arm M1 laptop with an integrated GPU

at 1080p with Medium settings at around 30FPS (the entire video with both indoor and outdoor scenes

You're assuming someone else can replicate Rosetta 2's performance, which I don't think is a good assumption. It is the first time anyone has implemented static translation of binaries from one ISA to another, it isn't just a performance improvement using the same old JIT everyone has always been using. Apple has no doubt been working on this for many years, perhaps as long as a decade.

Apple also had some pretty big advantages in making this happen, since the Mac is essentially a developer monoculture environment - pretty much everyone uses Apple's dev tools to build Mac apps. So they had the opportunity long ago to insure that the code generated would be optimal for Rosetta 2. Not only that, they retired both 32 bit code and a bunch of outdated APIs shortly before announcing the ARM Macs and Rosetta 2, further simplifying the task and helping the process along.

Console vendors have a similar developer monoculture, but paving the way like Apple did would have required having those things in the dev tools for the PS5 / Xbox series X at launch - i.e. knowing that they might switch ISAs in the PS6 / Xbox "Y", which implies planning at least for the possibility of switching off x86 years before their release. I'd be surprised if they were thinking that way.

Console game developers are also more likely to take performance shortcuts using things that will trip up static translation, like undocumented APIs, self modifying code, etc.

Schmide · Apr 28, 2021

Doug S said:
self modifying code

Yeah that's been persona non grata for decades...

oop can't do that, everything is in flight

Nothingness · Apr 28, 2021

Doug S said:
You're assuming someone else can replicate Rosetta 2's performance, which I don't think is a good assumption. It is the first time anyone has implemented static translation of binaries from one ISA to another, it isn't just a performance improvement using the same old JIT everyone has always been using. Apple has no doubt been working on this for many years, perhaps as long as a decade.

FX!32 was also doing some form of static translation a quarter of century ago (IA32 to Alpha). Though I'm impressed by what Apple achieved, it could be argued Digital's task was even more difficult than Apple's one given the total lack of "control" over applications for Windows; and also Apple added some HW support to help speed up things. So great engineering achievement from Apple, but nothing really new or that can't be reproduced.

Thala · Apr 28, 2021

ThatBuzzkiller said:
Most mobile GPU designs are still years behind AMD in terms of supported features. There's this one really technologically exotic game such as Dreams on the PS4 which relies on a really obscure mechanism known as ordered append for doing point cloud rendering in that game which is NOT available on any public gfx API because no GPUs aside from AMD supports that feature!

With no word did I refer to mobile GPU designs, nor any designs existing today. Of course it will be a GPU with the requirement list written by Sony/MS/Nintendo.
At this point i am not even sure what you are thinking - do you believe Micosoft will just take a few vanilla MALI cores from 2021 in order to build the next XBox?

Thala · Apr 28, 2021

Doug S said:
You're assuming someone else can replicate Rosetta 2's performance, which I don't think is a good assumption. It is the first time anyone has implemented static translation of binaries from one ISA to another, it isn't just a performance improvement using the same old JIT everyone has always been using. Apple has no doubt been working on this for many years, perhaps as long as a decade.

Microsoft did it with x64 emulation in Windows - and this is even with a JIT approach. So I do not believe static translation is even necessary if you can achieve similar speeds with JIT. Before Microsoft introduced x64 emulation, i was also assuming that JIT cannot easily break the 50% performance barrier - but apparently i was wrong.

Doug S · Apr 28, 2021

Nothingness said:
FX!32 was also doing some form of static translation a quarter of century ago (IA32 to Alpha). Though I'm impressed by what Apple achieved, it could be argued Digital's task was even more difficult than Apple's one given the total lack of "control" over applications for Windows; and also Apple added some HW support to help speed up things. So great engineering achievement from Apple, but nothing really new or that can't be reproduced.

Not really, FX!32 was simply a JIT that cached its results and optimized/saved them for future runs. That was new 25 years ago but is par for the course for JITs these days.

Here's a snippet from the Usenix paper abstract on it:

The translator provides native Alpha code for the portions of an x86 application which have been previously executed.

DrMrLordX · Apr 28, 2021

moinmoin said:
Best case for ARM is that with hyperscalers being early adopters there will also be more demand and more offers on the open market. And that is the real danger AMD and Intel face. But that's not what's happening just yet.

Right now, there's only one "at large" ARM server player left, and that's Ampere. If they get a foothold then yeah the x86 world faces some trouble. If they get bought out by Google or Microsoft then maybe not so much.

Gideon · Apr 28, 2021

I would actually much prefer if instead of the current situation say Ampere (servers) and Qualcomm (laptops) or Nvidia (desktops) would be the ones getting orders, instead of vertical integration (just no nvidia buying ARM). Yeah ARM desktops for gaming would be a joke for a while, at least in terms of compatiblity, but they would get there and all would benefit. If it's Google, Amazon, Microsoft all going their own version of "walled garden" it will help people much less.

I wouldn't like them to push x86 aside but it would be very welcome as a strong alternative.

moinmoin · Apr 28, 2021

DrMrLordX said:
Right now, there's only one "at large" ARM server player left, and that's Ampere. If they get a foothold then yeah the x86 world faces some trouble. If they get bought out by Google or Microsoft then maybe not so much.

Right. I was actually trying to refer to economy of scale again, especially using that to cover as many markets as possible, which is the big advantage x86 (especially AMD, not so much Intel anymore) currently enjoys and ARM somehow doesn't manage due to the fragmentation between plenty specialized ARM licensees which all don't really expand beyond their area of expertise (Qualcomm was closest before shuttering Centriq).

yuri69 · Apr 28, 2021

AWS priced their ARM-based instances pretty cheap even compared to already cheap AMD-based EC2s. Cutting expenses is always on the top of the priority list when you are a cloud subscriber. This means AWS really wanted to migrate their subscribers to the ARM-based instances. Hence the 2020 numbers.

Was it really due to lower TCO and/or other benefits compared to x86? Who knows.

Schmide · Apr 28, 2021

This boggled me a bit.

Counting the number of operands in a vector to make a bigger graph isn't really semantically correct.

You are doing the same number of vectors per clock.

Subdivisions, be cool or be cast out

ThatBuzzkiller · Apr 28, 2021

Thala said:
With no word did I refer to mobile GPU designs, nor any designs existing today. Of course it will be a GPU with the requirement list written by Sony/MS/Nintendo.
At this point i am not even sure what you are thinking - do you believe Micosoft will just take a few vanilla MALI cores from 2021 in order to build the next XBox?

Even GPUs from the future won't be good enough because there are STILL features exclusive to AMD GPUs for almost 10 years with no rough equivalent yet!

We have features like command buffer prediction on Mantle, indirect command generation for changing PSOs on Xbox, and they have a single instruction for testing BVH intersections (can't be emulated with fixed func ray traversal HW on NV), etc ... (AMD also natively supports indexing into the register file as well)

GPU emulation isn't going to be made trivial over time alone ...

Thala · Apr 28, 2021

ThatBuzzkiller said:
Even GPUs from the future won't be good enough because there are STILL features exclusive to AMD GPUs for almost 10 years with no rough equivalent yet!

We have features like command buffer prediction on Mantle, indirect command generation for changing PSOs on Xbox, and they have a single instruction for testing BVH intersections (can't be emulated with fixed func ray traversal HW on NV), etc ... (AMD also natively supports indexing into the register file as well)

GPU emulation isn't going to be made trivial over time alone ...

As i said, you will not emulate a GPU on instruction set level, you just need compatibility on API level. Shader compilers doing the heavy lifting.

ThatBuzzkiller · Apr 28, 2021

Thala said:
As i said, you will not emulate a GPU on instruction set level, you just need compatibility on API level. Shader compilers doing the heavy lifting.

That's not true for most consoles ... (the HLE strategy you mentioned isn't going to work for modern consoles)

Console games do offline compilation for their shaders and ship native GPU bytecode which is why runtime shader/pipeline compilation (increased load times/stutter) doesn't exist on consoles compared to PC ...

Console gfx APIs like GNM or Xbox D3D are statically linked (anti-HLE) too so it's totally pointless to reimplement APIs when games emit PM4 packets. You obviously have no understanding about GPU emulation at all ...

News ARM Server CPUs

Platinum Member

Lifer

Golden Member

Golden Member

Lifer

Lifer

Platinum Member

Golden Member

Lifer

Platinum Member

Lifer

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Golden Member

Golden Member

Diamond Member

Lifer

Platinum Member

Diamond Member

Senior member

Diamond Member

Golden Member

Golden Member

Golden Member