News Big movements afoot in ARM server land

soresu · Sep 22, 2020

Haven't seen a new thread discussing it yet and loathe to use necromancy on an older thread, so here talking about the Neoverse V1 and future N2 cores just announced.

V1 (codename Zeus on previous roadmap) is available for licensing now and seems to be basically the Cortex X1/Hera core with 2x256b SVE1 units, projected as max 96C per socket and +50% IPC over N1.

N2 (codename Perseus) is coming next year and supposedly based on the Cortex Axx core to succeed A78/Hercules, so possibly Matterhorn - it has 2x 128b SVE units though ARM are tight lipped on whether they are SVE2 or not, projected as max 128C per socket and +40% IPC over N1.

Link here to the Anandtech article with lotsa depth.

Thala · Sep 24, 2020

piokos said:
And can you point such a "most common use case"?

There's nothing inherently bad about x86 either (clearly). This isn't really a trigger for a revolution.

There are lots of issues which are intrinsically bad about x86.

Thala · Sep 24, 2020

piokos said:
That probably won't be able to run Windows ARM, Android or many existing ARM Linux distros (in contrast to the x86 reality of today).
We'll see how well x86 emulation works and if Macs are still capable as professional tools. I'm not holding my breath, but I'd like to be surprised - honestly.

I'm not sure MacBooks are a sensible alternative to RPis as a cheap platform for experiencing ARM.

Why would it not run ARM Windows and the more common Linux distributions? Since both are running under other VMs already (e.g. Windows running fine in a Linux KVM), i do not see why they would not run with Apples?

Doug S · Sep 24, 2020

Thala said:
Why would it not run ARM Windows and the more common Linux distributions? Since both are running under other VMs already (e.g. Windows running fine in a Linux KVM), i do not see why they would not run with Apples?

They will run under VM, but it is unlikely we'll ever see a Mac booting them unless Apple supplies GPU drivers.

Thala · Sep 24, 2020

Doug S said:
They will run under VM, but it is unlikely we'll ever see a Mac booting them unless Apple supplies GPU drivers.

Sure, but that was not the statement from Piokos, who claimed they will not run at all. Indeed it is highly unlikely that Apple bothers providing DX12 drivers - they will leave this task to the VM solutions like Parallels.

name99 · Sep 24, 2020

piokos said:
I don't understand why you mention 1995 so often. Some special year or what?

No. SVE guarantees compatibility between different vector register sizes. But the implementation impacts efficiency (both in memory and performance).

Data warehouse is a system. You probably meant:
- data centers - no, most of the time they don't compile the code that is run. You're doing it as a client.
- cloud service providers - yes, they maintain their services (but clients still take care of the VMs)

But this "fluidity" leads to exactly what I mentioned earlier: specialized, incompatible (or compatible but cumbersome) implementations.
Sure, it's all under a sexy umbrella of ARM, but so what?

And if we start to make ARM more universal, it'll start losing a lot of it's advantages - surely lightness, likely efficiency as well.

ARM is not faster. It's just more efficient in some solutions.
And yes, it's a very solid business proposition that will give x86 many years to adapt. Or Intel and AMD to drop it.

There's no doubt that x86 was a bit stagnant lately. But I don't think that's something the "enthusiast PC" community as a whole should criticize - given the shitstorm against AVX-512 or because Lakefield doesn't support AVX. This is an everyday normality in the ARM world.
Seriously, to everyone who treats ARM as a mystical golden grail of future computing - just get a cheap RPi and try setting it up as a second PC.

And yes, I'm writing this on my RPi 3A+.

1995 was perhaps the high water mark for the "classical windows" model of software distribution. Software distributed on physical media that you bought, the internet clearly on its way but no-one had yet internalized how much it would change things.
GPUs (the first harbinger of a new way of doing things, with on-demand finalization at the point of execution) and Java not yet big things.

Classical software distribution puts a premium on endless ISA compatibility. But all the various options I have been telling you about no longer care as much about endless ISA compatibility. Sure, you don't want a new ISA every year -- writing a compiler, writing an OS, optimization, take a LOT of work. But
- doing it once every 15 years or so is feasible. We did ARMv8 at around 2010, ARMv9 will be coming out soonish (and is likely close enough to ARMv8 that much of the tools and OS work won't be a huge ordeal).
- you can engage in annual tweaks of the ISA AND have those be of value, in the sense that code can re-compiled (in an app store), reJIT'd, reFinalized to take advantage of that new ISA right away.

I get the feeling you know nothing about SVE beyond a very superficial level.

Comparing a CPU that's deliberately optimized to be as cheap as possible in a system that's optimized to be as customizable as possible with a standardized PC seems like an attempt to score points, not top understand.
Want an ARM-based cheap PC that's easily set up and used? Buy an iPad.

name99 · Sep 24, 2020

piokos said:
That probably won't be able to run Windows ARM, Android or many existing ARM Linux distros (in contrast to the x86 reality of today).
We'll see how well x86 emulation works and if Macs are still capable as professional tools. I'm not holding my breath, but I'd like to be surprised - honestly.

I'm not sure MacBooks are a sensible alternative to RPis as a cheap platform for experiencing ARM.

I thought the conversation was about "where is mass future computer going" not about "what do linux/windows experts want from their future devices"?
NO-ONE is claiming that ARM (in Mac, in Chromebook, in Surface) will be a replacement for the futzing around you can do on your PC today. If that's the argument you want, go find a different strawman to fight with.

A more interesting question is whether this niche actually has any long-term viability. There have been complaints about the inability to tinker since the dawn of technology. And yet no-one fiddling with a PC considers it crippling that they can't change individual logic gates (let alone valves) to see what happens. What has always happened is that tinkerers move up the stack, finding as much interest in fiddling at a higher level even as the ability to tinker at a lower level goes away.

soresu · Sep 24, 2020

Thala said:
Indeed it is highly unlikely that Apple bothers providing DX12 drivers

Is VKD3D to MoltenVK not an option?

soresu · Sep 24, 2020

Oh dear, seems hopes of v9-A next year are dashed since the GCC commit for Neoverse N2 is v8.5-A with SVE/SVE2.

Read and weep.

soresu · Sep 24, 2020

Antey said:
wasn't K12 a Zen twin core?

In so much as they were designed together from the same base concept design.

However at this point I would be very surprised if Zen4 was not far more x86 specific than the early concept upon which both Zen1 and K12 were based.

"Porting" Zen4 would probably be more hassle than it was worth, especially considering how performant off the shelf ARM Ltd core designs have become since K12 was in development in 2015/16.

At this point it would be a rather silly waster of R&D funds to go haring off designing a new custom ARM core when they are still competing on both x86 and graphics.

A better use of funds would be a more mobile/embedded focused x86 core to succeed Jaguar if they are going down the big/little path for real - but I have my doubts about that happening when they are doing so well with Zen2 as to be almost under 10W with an 8 core APU SoC, with Rembrandt I would expect sub 10W to be easily achievable with a reasonable clock speed.

soresu · Sep 24, 2020

Thala said:
Why would it not run ARM Windows and the more common Linux distributions? Since both are running under other VMs already (e.g. Windows running fine in a Linux KVM), i do not see why they would not run with Apples?

You might manage it with a lot of bother - but I'd rather run on a slower, better supported hardware platform than a faster, jury rigged Apple one.

I just hope Cortex X1 based boards or Chromebooks don't take years to pop up considering how long we have already been waiting for A76 based solutions to enter the market so far.

At this point you have to wonder what is worse - the apathy of the ARM suppliers or the bribery/contra revenue from Intel feeding the Chromebook OEM's to keep them on ancient A72/A73 based SoC's.

Thala · Sep 24, 2020

soresu said:
Is VKD3D to MoltenVK not an option?

Sure, for Virtualization something like this is the solution because it provides mapping of DX12 to the native Metal driver in MacOS. However if we are talking Bootcamp, a native Windows DX12 driver for Apple GPUs would be required.

Thala · Sep 24, 2020

soresu said:
You might manage it with a lot of bother - but I'd rather run on a slower, better supported hardware platform than a faster, jury rigged Apple one.

I just hope Cortex X1 based boards or Chromebooks don't take years to pop up considering how long we have already been waiting for A76 based solutions to enter the market so far.

At this point you have to wonder what is worse - the apathy of the ARM suppliers or the bribery/contra revenue from Intel feeding the Chromebook OEM's to keep them on ancient A72/A73 based SoC's.

I sure would prefer an Qualcomm 8CX successor with 8 big cores (and SVE) on 5nm and having all the Windows drivers instead of using Apple HW. In any case at the moment i am still more than happy with my Surface Pro X.
At the moment Microsoft is working on OpenGL HW acceleration support for WSL including a Wayland compositor, which should give WSL a nice boost.

A/// · Sep 24, 2020

soresu said:
However at this point I would be very surprised if Zen4 was not far more x86 specific than the early concept upon which both Zen1 and K12 were based.

"Porting" Zen4 would probably be more hassle than it was worth, especially considering how performant off the shelf ARM Ltd core designs have become since K12 was in development in 2015/16.

Why Zen4 specifically? The way AMD has put it is this:

Zen: Base
Zen+: Fix
Zen2: Chiplet plus evolutionary design of Zen and Zen+
Zen3: New core architecture; "revolutionary"
Zen4: Evolutionary redesign of Zen3 uarch???

I never read much into K12 other than it was shelved due to AMD not having the funds at the time to run both products and wanting to focus on products they were good at or were good at some point. Right now, and this is strictly my own opinion, AMD excels at DC, client, and to some extent mobile. TGL is a lot of hot air but it's still impressive in certain ST situations. AMD needs to get their ass into gear for the mobile market. It's too early to tell how CDNA will perform but NVidia nearly has the market cornered. There's a lot of weird funk coming along in the next decade and to pull my old man card I'm not liking any of it.

soresu · Sep 24, 2020

Thala said:
At the moment Microsoft is working on OpenGL HW acceleration support for WSL including a Wayland compositor, which should give WSL a nice boost.

I think MS are also working on OGL and OCL -> DX12 translation layers, which would definitely be a big plus for long lagging legacy software support on WARM devices.

Link.

Once that is done they just need x64 -> ARM64 dynarec and it will open a lot more doors for WARM to expand sales.

Doug S · Sep 25, 2020

soresu said:
Oh dear, seems hopes of v9-A next year are dashed since the GCC commit for Neoverse N2 is v8.5-A with SVE/SVE2.

Read and weep.

Why are people so fixated on ARMv9? What are you expecting it will have that v8 lacks? Unless ARM makes some fundamental changes to the ISA that requires obsoleting some parts of the less than a decade old v8, there is no reason to go v9 rather than 8.6, 8.7, and so on.

ThatBuzzkiller · Sep 25, 2020

soresu said:
Is VKD3D to MoltenVK not an option?

It might be a viable option if Apple still keeps AMD as their high-end GPU supplier ...

If not then expect sub-10fps on native D3D12 applications on Apple silicon even on low settings since they have to run on TBDR GPUs. D3D12 API design matches closely with those of IMR GPUs so it'll be very painful trying to emulate IMR architectures on TBDR GPUs ...

Translation layers are more successful if their underlying software and hardware interfaces are similar. DXVK is somewhat of a success story on Linux with x86 CPUs and AMD/NV GPUs. DXVK over MoltenVK on macOS is an unmitigated disaster even with Intel CPUs and AMD GPUs because Metal is mainly designed to run on TBDR GPUs so it makes the ultimate compromise to not expose desktop GPU features in order to make their iPhones look good ...

Think of it like this, a D3D11/D3D12 driver implementations will have more in commonality with other desktop GPU Vulkan driver implementations compared to a Metal driver implementation ...

If macOS can't even handle DXVK (D3D11) all that well then the chances are just as low or slimmer where it'll be able to cope with VKD3D (D3D12) ...

name99 · Sep 25, 2020

soresu said:
Oh dear, seems hopes of v9-A next year are dashed since the GCC commit for Neoverse N2 is v8.5-A with SVE/SVE2.

Read and weep.

Hey we can still hope for the A15 and Apple getting there first!

name99 · Sep 25, 2020

Doug S said:
Why are people so fixated on ARMv9? What are you expecting it will have that v8 lacks? Unless ARM makes some fundamental changes to the ISA that requires obsoleting some parts of the less than a decade old v8, there is no reason to go v9 rather than 8.6, 8.7, and so on.

It depends on your interests.
(a) The most obvious constraints on ARM right now are in fact virtual address space. ARM is using as many high bits as possible (which isn't many!) to mark pointers for security purposes. Meanwhile Apple (and Android?) want as many pointer bits as possible for tagging.
How to handle this? One possibility is to go to 128 bit pointers, but that's problematic if it forces 128bit arithmetic. Alternatively make all addresses 64+64 bits (high 64 are permissions/tags, not "address") and somehow tie two registers together for all addressing purposes?
Or go a third direction and give the system capabilities (ie essentially a 65th bit that only the OS can see and manipulate)? IBM of course has done this for years, and CHERI is a similar system for ARM that prototyped the idea and appears to work.

(b) Second big problem is instruction size. 32-bits is becoming really tight (witness the uncomfortable contortions with SVE instructions). The obvious next step is to allow a combination of 32 and 64-bit addresses (as IBM has just started). But you can do this much better (ie in a way that allows the system to be more performant and secure) if you start from scratch rather than retrofit. My mechanism of choice would be that 32-bit instructions have the first bit set to 0, 64 bit instructions have the first bit of each instruction half set to 1. (Maybe also a rule that 64 bit instructions can't straddle a cache line?)
That way it's immediately obvious even at pre-decoding, and you can't engage in nonsense like jumping into the middle of an instruction.

ARM are clearly by far the best instruction set designers on the planet, balancing what's best for fast implementation, good compiler utilization, and future extensibility in away that no-one else (not Intel ever, not POWER, not RISC-V) come close to. Which means if you have any interest in ISA design you're curious about how they deal with these two issues (and, if they go to 32+64 bit wide instructions, how they will use the much larger instruction space available) because it's always interesting to see the next tasteful item created by people with a history of great taste.

soresu · Sep 25, 2020

Doug S said:
Why are people so fixated on ARMv9? What are you expecting it will have that v8 lacks? Unless ARM makes some fundamental changes to the ISA that requires obsoleting some parts of the less than a decade old v8, there is no reason to go v9 rather than 8.6, 8.7, and so on.

It's too late to make SVE2 standard for v8.5 or v8.6 as they are already published.

If SVE2 is to truly replace NEON it needs to become a standard linked to an ISA version as NEON did with v7-A.

It's not necessarily needed for SVE2 to become a standard part of the main ARM ISA - but I imagine devs will be less jittery about using it in their applications if they do so.

Also it's been longer since v8-A was first announced than it between v7-A and v8-A, so it's not a huge leap to imagine v9-A being right around the corner.

Especially after ARM went to the trouble of highlighting just how important and MAJOR multi year investments both SVE2 and TME when they were first announced were back in 2019.

Last point, there have been several job entries referring to v9 ISA/core development - this could have simply been a smokescreen for something else or later renamed but why bother even naming it if so?

soresu · Sep 25, 2020

name99 said:
Second big problem is instruction size. 32-bits is becoming really tight (witness the uncomfortable contortions with SVE instructions).

I assumed that 64 bit was a necessity for SVE - it really should be by now.

Thala · Sep 25, 2020

soresu said:
I think MS are also working on OGL and OCL -> DX12 translation layers, which would definitely be a big plus for long lagging legacy software support on WARM devices.

I already compiled the OGL->DX12 translation layer for ARM64 from GIT sources - and it is indeed working. Currently somewhat slow, as it has lots of debug code inside - to give you an idea Quake 3 is playable, Doom 3 is too slow.

Doug S · Sep 25, 2020

soresu said:
It's too late to make SVE2 standard for v8.5 or v8.6 as they are already published.

If SVE2 is to truly replace NEON it needs to become a standard linked to an ISA version as NEON did with v7-A.

It's not necessarily needed for SVE2 to become a standard part of the main ARM ISA - but I imagine devs will be less jittery about using it in their applications if they do so.

Also it's been longer since v8-A was first announced than it between v7-A and v8-A, so it's not a huge leap to imagine v9-A being right around the corner.

Especially after ARM went to the trouble of highlighting just how important and MAJOR multi year investments both SVE2 and TME when they were first announced were back in 2019.

Last point, there have been several job entries referring to v9 ISA/core development - this could have simply been a smokescreen for something else or later renamed but why bother even naming it if so?

SVE wasn't linked to a specific v8.x version, it was and still is optional even at v8.6. SVE2 is similarly optional. There hasn't exactly been a big rush to implement either, so making it mandatory would be foolish on the part of ARM. If they do a major rev of the ISA sure, you include it there - and remove NEON.

It really depends on what ARM's goal with v9 is. If it is like name99 suggests and supports 64 bit instructions in some form so they can keep adding stuff to SVE*, and especially if they go crazy and go to 128 bit addresses so it can support a whole mess of flags, v9 would be an ISA targeted primarily at servers.

It was obvious v7 was on borrowed time once ARM moved outside of the embedded world, since it was limited to 32 bit addresses. The fact that v8 cleaned up some issues with v7 so that it was actually a little bit faster made it fait accompli that the smartphone world would quickly go there. There's no pent up pressure for phones (or Macs) to go beyond 64 bit addresses, and there aren't any obvious Achilles heels in v8 that could be corrected so v9 is highly unlikely to perform better than v8 (it would in fact perform worse if it has 128 bit addresses) So just because ARM releases v9 doesn't mean there will be any reason to adopt it outside of the server market that it would likely be intended for.

Thala · Sep 25, 2020

soresu said:
I assumed that 64 bit was a necessity for SVE - it really should be by now.

There is lots of coding space left in 32 bit - SVE does not require 64 bit at all.

soresu · Sep 25, 2020

Doug S said:
If they do a major rev of the ISA sure, you include it there - and remove NEON.

No, this has already ratified by ARM themselves in the SVE2/TME announcement - future chips with SVE2 will retain NEON code compatibility.

Doug S said:
There hasn't exactly been a big rush to implement either, so making it mandatory would be foolish on the part of ARM.

SVE2 was just announced last year, and SVE1 lacked certain instructions that would give it more or less functional parity with NEON so even the core designers like Apple with the money to rush SVE1 into a product would not have bothered.

Did you expect them to announce it last year and have it in Cortex X1 this year?

ARM Ltd's PR emphasis of MULTI YEAR investment in SVE2 and TME suggests this is not a thing to be rushed and will be a lasting change.

Doug S said:
SVE wasn't linked to a specific v8.x version, it was and still is optional even at v8.6. SVE2 is similarly optional.

I understand this, and that's kind of the whole point I just made about v9-A in the first place.

Right now it is optional, and therefore only developers targeting specialized use cases (ie Fujitsu/Riken's post-K supercomputer) will make make the effort to put it in their applications (outside of auto vectorized compiler code).

Only by standardizing it to an ISA iteration will ARM get it to truly proliferate across the software ecosystem.

It seems extremely unlikely that they would standardize such a major telegraphed change to anything less than a major integer ISA revision.

I was only talking about SVE2 as both that is the only thing relevant to media ops ala NEON, and I would assume that anything with SVE2 also includes all of SVE anyway - as we are never going to see SVE1 in a consumer chip I see little point in talking about it (personally that is, I do understand it has uses beyond HPC/supercomputers).

Systems analyst · Sep 25, 2020

Here are a couple of additional ARM charts:

Up to 192 cores are mentioned, and CCIX linking sockets; I read mention of up to 4 sockets linked.

Linus Torvalds is reported as saying that you need ARM-based desktop machines to develop applications for the ARM cloud. Maybe, but this is now available with ThunderX2 machines at a price. Apple are giving an upgrade path from iPhone to iPad to laptops and desktops. It seems likely that they will transition iCloud to ARM, to enable proprietary applications developed on a desktop to have a growth path.

ARM design and test using ThunderX2 data-centre(s). For peak loads, they use ARM instances on AWS, as far a possible, though some work requires X86 instances. The V1-based systems may replace Intel for the latter purpose, in due course.

There is also a rumour that Microsoft wants half their server-fleet to use ARM-based processors.

It seems possible that ARM-based desktop machines and data-centres will proliferate, together with cloud use of ARM-based servers. We will have to wait and see.

News Big movements afoot in ARM server land

Platinum Member

Golden Member

Golden Member

Platinum Member

Golden Member

Senior member

Senior member

Platinum Member

Platinum Member

Platinum Member

Platinum Member

Golden Member

Golden Member

Diamond Member

Platinum Member

Platinum Member

Golden Member

Senior member

Senior member

Platinum Member

Platinum Member

Golden Member

Platinum Member

Golden Member

Platinum Member

Member