In other words we still know nothing about how ARM scales up.
What do you want to know? You won't get a useful answer if you don't pose a well-formed question.
The first question is one of correctness: can ARM scale up? The answer is clearly yes. "They" (ie some combination of ARM Inc and the various large SoC vendors) have a NoC that scales to at least 96 cores, and a solution (perhaps directory-based) for handling coherence across this many cores.
So correctness works. Then we have performance. And the issue is: what do you want to know.
There are multiple issues:
- there is total bandwidth. This is going to be constrained, more than anything else, by the number of memory controllers. This will presumably be scaled to what the data warehouse vendors *normally* require, not to the super-highest-value possibly imaginable. Right now these cores are targeting the cheap easily ported segment of the market, they're not trying to be specialized (and very expensive) engines for the most demanding jobs.
- how well does the LLC handle all the different demands of all these clients. This includes things like arbitrating between multiple prefetchers, but also topology, how much data to replicate in different slices, along with fancier ideas -- virtual write cache, LLC compression, dead line prediction, ...
- how well handled are locking primitives or barriers, things that need to enforce an ordering between more than one core
The bottom line is that there are no single numbers or benchmarks that give you useful answers to these three questions. If your goal is to have like a sports fan, you can latch onto something, misinterpret it, and wave a flag; but if your goal is understanding you're somewhat stuck. The best one can honestly do is look at things like success stories on Graviton 2, or occasional similar blog posts from the usual companies that operate in this space and are reasonably transparent (like Cloudflare).
And sure, those blog posts will mainly tell you how a certain type of code runs on these many-cores, it won't tell you how very different code runs. If Graviton 2 was not designed for massive bandwidth HPC calculations, no-one's going to try running their QCD code their. If it's a poor fit for SAP, no-one going to run SAP on it.
But NONE OF THAT tells you anything about "ARM's ability to scale". It tells you about the market that the ARM vendors are currently targeting. Which you should already know. A sane company doesn't decide that its first (or second) generation product is going to target not just the low-lying fruit but every computational task in the enterprise universe, from z to HPC to SAP to AWS!