• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Review Raspberry PI (ARM A72) vs EPYC for DC study. Interesting results.

Page 3 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Thala

Golden Member
Nov 12, 2014
1,255
564
136
octal-core NVIDIA Carmel ARMv8.2 CPU

For the ARM gurus, before I make a fool of myself, how recent is this core ? and on what process node ? It looks like 12nm. First tested Sept 2018, so should be more recent than Rome.

But how fast is it compared to some you have been mentioning here ?
@Mark
Carmel is on par with Cortex A75 give or take.
I already did point out, that if you want the most recent ARM core (a Cortex A76, mind you?), which can run Linux (as part of WSL) out of the Box, it would be the Surface Pro X. All newer Cortex A CPUs are only available in Phones as of today.

Also of note is, that ARMv8.x refers to the ISA revision, not necessarily to a particular implementation. A higher ISA revision typically means, later versions supporting more instructions.
For example A76, A77, A78 and AX are all ARMv8.3 compliant.
 
Last edited:

blckgrffn

Diamond Member
May 1, 2003
7,937
1,125
126
www.teamjuchems.com
All of those are OOS BTW
I was just saying that although they tout themselves as upgrades from a Pi the differences in terms of what you are looking for are minimal at best, given they use the same CPU architecture and more advanced GPU.

So these affordable alternatives would just be a waste of your time.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
Quick reminder - Naples was released in 2017 but Rome in 2019...
It was in user testing later in 2018. Not sure what you call is, but using ES chips.. Yes, Auf 7th 2019 was the official release.

As for the chips, I am not spending over $1000 to test these under a surface pro x., so how close would that Xavier NX be to a 2019 release arm ?
 

A///

Senior member
Feb 24, 2017
922
643
136
It was in user testing later in 2018. Not sure what you call is, but using ES chips.. Yes, Auf 7th 2019 was the official release.

As for the chips, I am not spending over $1000 to test these under a surface pro x., so how close would that Xavier NX be to a 2019 release arm ?
Correct! Milan was in ES testing around a year agoish and we only got solid info this past July thanks to someone digging into Red Hat's bug tracker for QEMU.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
Correct! Milan was in ES testing around a year agoish and we only got solid info this past July thanks to someone digging into Red Hat's bug tracker for QEMU.
So time wise (not exactly node wise) what is a comparable ARM chip to compare to Rome ? I want to do this if possible. I found a Xavier NX for $387 on Amazon, I could do that.
 
  • Like
Reactions: Drazick

Thala

Golden Member
Nov 12, 2014
1,255
564
136
It was in user testing later in 2018. Not sure what you call is, but using ES chips.. Yes, Auf 7th 2019 was the official release.

As for the chips, I am not spending over $1000 to test these under a surface pro x., so how close would that Xavier NX be to a 2019 release arm ?
Ok talking floating point IPC here:

A57->A72 25% (2015) - Raspberry Pi 4
A72->A73 5% (2016) - HP Envy X2
A73->A75 30% (2017) - Xavier NX
A75->A76 30-35% (2018) - Surface Pro X
A76->A77 30-35% (2019) - Samsung Galaxy S20
A77->A78 7% (2020)
A78->AX1 22% (2020)

So a Cortex A78 would be roughly twice as fast as an A75 clock per clock, an Cortex AX1 would be even faster.

Regarding power efficiency, part is coming from node, the other from architecture. For instance the Cortex A78 achieves the 7% higher performance while using 4% less power at same node and frequency compared to Cortex A77.
 
Last edited:
  • Like
Reactions: Tlh97

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
Ok talking floating point IPC here:

A73->A75 30% (2017) - Xavier NX
A75->A76 30-35% (2018) - Surface Pro X
A76->A77 30-35% (2019) - Samsung Galaxy S20
A77->A78 7% (2020)
A78->AX1 22% (2020)

So a Cortex A78 would be roughly twice as fast as an A75 per clock cycle, an Cortex AX1 would be even faster.

Regarding power efficiency, part is coming from node, the other from architecture. For instance the Cortex A78 achieves the 7% higher performance while using 4% less power at same node and frequency compared to Cortex A77.
So, not a good time for me to try and invest in ARM, since I can;t buy any of the new ones, and I have an iphone11 ,but don't want it dying all the time (even if it could run BOINC). A surface x would be more expensive per total power than a Rome.

I guess I need to stay with EPYC for now. Any other good ideas ?
 
  • Like
Reactions: Drazick

sdifox

No Lifer
Sep 30, 2005
84,659
9,253
126
Last edited:

NTMBK

Diamond Member
Nov 14, 2011
9,298
2,714
136
Nintendo Switch is a Tegra X1 Quad A73+Quad A53. You can run linux on there. Please note that there is a new version incoming.

Switch uses A57 cores, not A73.
 

sdifox

No Lifer
Sep 30, 2005
84,659
9,253
126
[/QUOTE]
Switch uses A57 cores, not A73.
Doh! Mea Culpa.


Op do you know someone in university? Nvidia does provide education discount for their dev kit. Right now it is only showing TX2 but I imagine soon the Xavier NX will show up.

 
Last edited:

eek2121

Golden Member
Aug 2, 2005
1,052
1,116
136
A72 is a design from ~2015 and not on leading edge process like AMD either. So its like comparing fresh Apples to outdated fried fruit.

Make no mistake, those ARM monolithic monsters will steal business from AMD/Intel. Once per core performance is decent enough, it is game over.
I try to avoid threads like this because they are full of ARM fanboys. However I will say this:

It’ll never happen. People assume x86 is some large, inefficient architecture. That is completely false. x86 is designed for high performance with moderate power consumption. ARM is designed for low-moderate performance at low power consumption. Some ARM chips sacrifice power consumption for performance (Graviton2, Apple A13), and some x86 CPUs sacrifice performance for low power consumption.

When it comes to scalable, raw perf/watt I’ve yet to see a single ARM CPU even come close. I cannot buy a 128 core/256 thread ARM server to put in my datacenter. Period. I cannot match x86 compute density with anything ARM related. Period.
 

eek2121

Golden Member
Aug 2, 2005
1,052
1,116
136
Ok talking floating point IPC here:

A57->A72 25% (2015) - Raspberry Pi 4
A72->A73 5% (2016) - HP Envy X2
A73->A75 30% (2017) - Xavier NX
A75->A76 30-35% (2018) - Surface Pro X
A76->A77 30-35% (2019) - Samsung Galaxy S20
A77->A78 7% (2020)
A78->AX1 22% (2020)

So a Cortex A78 would be roughly twice as fast as an A75 clock per clock, an Cortex AX1 would be even faster.

Regarding power efficiency, part is coming from node, the other from architecture. For instance the Cortex A78 achieves the 7% higher performance while using 4% less power at same node and frequency compared to Cortex A77.
To catch up with Milan, ARM parts will need to be twice as fast as the A77 while consuming orders of magnitude less power. This is not hyperbole.

The A77 trails the Apple A13 and most current high end x86 chips by 30-40% in single threaded performance. The Apple A13 has 2 big cores that can match last year’s x86 in performance, but they run too hot and consume too much power to scale up. The A14 is expected to only be 5% faster in ST performance than the A13 and will be significantly slower than this year’s x86 chips (Zen 3, Tiger Lake, etc.) despite having a huge node advantage.
 
  • Like
Reactions: Tlh97 and Markfw

Doug S

Senior member
Feb 8, 2020
658
898
96
To catch up with Milan, ARM parts will need to be twice as fast as the A77 while consuming orders of magnitude less power. This is not hyperbole.

The A77 trails the Apple A13 and most current high end x86 chips by 30-40% in single threaded performance. The Apple A13 has 2 big cores that can match last year’s x86 in performance, but they run too hot and consume too much power to scale up. The A14 is expected to only be 5% faster in ST performance than the A13 and will be significantly slower than this year’s x86 chips (Zen 3, Tiger Lake, etc.) despite having a huge node advantage.
Who says Apple's cores "consume too much power to scale up"? They consume a lot less power than x86 laptop cores - let alone desktop/server cores. They may consume more power than ARM designed cores like the A77, but Apple only puts two of them in a phone instead of four like Qualcomm so they can afford a little extra power.

And who says A14 will only be 5% faster in ST performance? Apple's claim was that A14 is 40% faster than A12, which would imply a 16% gain over A13.

Sounds like a lot of wishful thinking from an x86 fanboy. You better watch out, Apple's Macs (which will undoubtedly be faster than what they put in the phone) might burst your bubble.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
Who says Apple's cores "consume too much power to scale up"? They consume a lot less power than x86 laptop cores - let alone desktop/server cores. They may consume more power than ARM designed cores like the A77, but Apple only puts two of them in a phone instead of four like Qualcomm so they can afford a little extra power.

And who says A14 will only be 5% faster in ST performance? Apple's claim was that A14 is 40% faster than A12, which would imply a 16% gain over A13.

Sounds like a lot of wishful thinking from an x86 fanboy. You better watch out, Apple's Macs (which will undoubtedly be faster than what they put in the phone) might burst your bubble.
Well, I am certainly not impressed withthe A13 in my iphone 11. Slower than my Galaxy 9 !
 
  • Like
Reactions: Tlh97 and Drazick

videogames101

Diamond Member
Aug 24, 2005
6,775
17
81
Hi Markfw, there is a lot of speculation *rim-shot* going on in this thread. To try answer your question "Please tell me where I am mistaken in my math" directly, you are not mistaken. Your EPYC 7742 is a great part. I think main disconnect here is that you are comparing against the Arm A72, which is 5 years old. It might not sound like much time, but it's like going from Bulldozer to Zen. The performance increases you'd see by moving from A72 to A76, A77, A78, X1, or any of the modern Apple Ax cores cannot be understated. Especially on a modern 7nm node. You would find the results significantly closer, see this chart from our friend Andrei on ST performance/power consumption:


So here's the kicker. These more modern Arm cores are not available anywhere except in high-margin devices like mid/high-tier Smartphones, expensive laptops or tablets, etc. There is no Raspberry Pi4 cost device with an A76-class core. There is not even a server box available with this class of core yet (Although I'm watching Ampere Computing closely!) This means all the hype you are seeing about Arm's power efficiency is not available for someone trying to run DC workloads cost effectively.

This is less about the Arm technology itself, and more about commercials. Taping out a modern core out at 7nm is so expensive that no one has yet decided to target the low-margin tech enthusiast consumer market. It makes sense to me given that an ecosystem for compatible motherboards, etc. doesn't exist, and Microsoft doesn't even make Windows on Arm available for purchase. You'd be making a product for enthusiasts running Linux. Not exactly a high-revenue play. So unfortunately, I don't see this changing anytime soon.

If you are interested, you could try running your work unit on an AWS M6g instance. This would give you a more modern Arm core to benchmark on. Although you would not have access to power consumption numbers.
 
  • Like
Reactions: Tlh97 and KompuKare

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
Hi Markfw, there is a lot of speculation *rim-shot* going on in this thread. To try answer your question "Please tell me where I am mistaken in my math" directly, you are not mistaken. Your EPYC 7742 is a great part. I think main disconnect here is that you are comparing against the Arm A72, which is 5 years old. It might not sound like much time, but it's like going from Bulldozer to Zen. The performance increases you'd see by moving from A72 to A76, A77, A78, X1, or any of the modern Apple Ax cores cannot be understated. Especially on a modern 7nm node. You would find the results significantly closer, see this chart from our friend Andrei on ST performance/power consumption:


So here's the kicker. These more modern Arm cores are not available anywhere except in high-margin devices like mid/high-tier Smartphones, expensive laptops or tablets, etc. There is no Raspberry Pi4 cost device with an A76-class core. There is not even a server box available with this class of core yet (Although I'm watching Ampere Computing closely!) This means all the hype you are seeing about Arm's power efficiency is not available for someone trying to run DC workloads cost effectively.

This is less about the Arm technology itself, and more about commercials. Taping out a modern core out at 7nm is so expensive that no one has yet decided to target the low-margin tech enthusiast consumer market. It makes sense to me given that an ecosystem for compatible motherboards, etc. doesn't exist, and Microsoft doesn't even make Windows on Arm available for purchase. You'd be making a product for enthusiasts running Linux. Not exactly a high-revenue play. So unfortunately, I don't see this changing anytime soon.

If you are interested, you could try running your work unit on an AWS M6g instance. This would give you a more modern Arm core to benchmark on. Although you would not have access to power consumption numbers.
Thanks for all that, a very civil and nicely worded response. Yes, I did not recognize that ARM would be only good for the extremely high end devices, with all the hype going on about it. Your explanation makes it clear why. I have decided to abandon ARM until it becomes more "consumer friendly". Even EPYC would seem not a good way to go, but I have found a few ebay deals that make it worth the investment. a 32 core EPYC for $379 ? Yup... But thats Naples. I am go to try and only get Rome from now on.
 
  • Like
Reactions: Tlh97 and Drazick

JoeRambo

Golden Member
Jun 13, 2013
1,139
1,002
136
ou'd be making a product for enthusiasts running Linux. Not exactly a high-revenue play. So unfortunately, I don't see this changing anytime soon.
It is changing. There is a ton of money and effort being pumped into ARM ecosystem. From gcc/llvm compilers to app porting to system level software like JVM's and Linux distros, there is mindshare growing each day.
What is currently missing, is "desktop" development machine for ARM and Apple is going to solve this not only for their chips, but for others too.

Without doubt cloud and hyper scalers will migrate first. The rest will take longer and won't be complete.

On business scale, what is happening is what Intel did to RISC competition in 1990s, cheap, high volume customer chips enabled pouring money into server designs that rode on process development fueled by said customer chips. It had nothing to do with architecture, Intel only had to be "good enough".
Graviton2 is not good enough, Graviton3 might be. Having good enough CPU without paying AMD/Intel margins will give huge advantages in TCO and push competition towards ARM as well.

AMD is a bit of wildcard here, cause they are riding the same ARM market fueled process advantage wave, but IMO they are too small to matter in long run.
 
Apr 30, 2015
131
10
81
It is changing. There is a ton of money and effort being pumped into ARM ecosystem. From gcc/llvm compilers to app porting to system level software like JVM's and Linux distros, there is mindshare growing each day.
What is currently missing, is "desktop" development machine for ARM and Apple is going to solve this not only for their chips, but for others too.

Without doubt cloud and hyper scalers will migrate first. The rest will take longer and won't be complete.

On business scale, what is happening is what Intel did to RISC competition in 1990s, cheap, high volume customer chips enabled pouring money into server designs that rode on process development fueled by said customer chips. It had nothing to do with architecture, Intel only had to be "good enough".
Graviton2 is not good enough, Graviton3 might be. Having good enough CPU without paying AMD/Intel margins will give huge advantages in TCO and push competition towards ARM as well.

AMD is a bit of wildcard here, cause they are riding the same ARM market fueled process advantage wave, but IMO they are too small to matter in long run.
I agree. When writing a new application to run on an SVE machine, I would want to experiment on a desktop first, to 'shape' the algorithm's design to take best advantage of SVE. There is no substitute for practice. It can then be ported to the cloud, for further tests.

AS for Graviton2, it is useful already for various use cases; ARM themselves use it for some use cases, but not all, when testing new designs.
Graviton3 will cover more use cases.
 

moinmoin

Platinum Member
Jun 1, 2017
2,556
3,275
136
@JoeRambo Another wildcard is how the ecosystem takes Nvidia's ownership of Arm, should the deal go through. While I agree x86 should have a hard time again Arm in the long run, Nvidia has plenty chances to mess that up once they are in control.
 
  • Like
Reactions: Tlh97 and KompuKare

Doug S

Senior member
Feb 8, 2020
658
898
96
Well, I am certainly not impressed withthe A13 in my iphone 11. Slower than my Galaxy 9 !
That's ridiculous, you'd have to search far and wide to come up with something the S9's CPU does faster than the iPhone's. If you want to make this claim, let's see the numbers that back it up.
 

LightningZ71

Senior member
Mar 10, 2017
800
730
136
I've been doing some more digging on the SBCs and there's a common thread with most of them. They have rather limited memory and storage systems. Only one or two of them have anything faster than eMMC or USB attached storage systems. Most of them have very anemic ram setups as well.

The only SBCs that seem to take memory and storage seriously are the Nvidia Xavier models. Unfortunately, the roughly A75 ish cores on the Xavier's aren't especially potent in raw compute, though, with code and libraries that are specifically targeted at the advanced features that they do have, they are decent for their generation. What they do have, though, is a very impressive cache and memory subsystem. If the code being run on them is memory and I/O intensive, they should blow the figurative doors off if any of the other SBC computers. So, it is going to be VERY situational with them on how good they can be.

The other thing to keep in mind is that, with the Xavier's, if you are just running things on their ARM cores, you're really wasting the rest of their resources. They have extensive GPU and ML capabilities. They are volta level CUDA compliant, so you would have your best results on producing usable work from them if you were to also harness those resources as well. Perhaps running one GPU compute task on the GPU section and five CPU tasks on the ARM cores would give you the absolute best results.

In addition to the above hardware situation, you have issues with respect to kernel and module software support across the whole ecosystem of SBCs. The iGPU sections tend to be very poorly supported under linux, and there are precious few projects out there that can even use them in their current form. This issue makes the Xavier's stand out even more. Nvidia and the community is providing a considerable amount of software and driver support for them. That extensive GPU section is quite accessible.

I close with this: none of the SBCs are physically capable of offering enough raw performance on the cpu side alone to be able to even approach the throughput of even mid level dedicated GPUs. They are so heavily optimized for the types of embarrassingly parallel functions that a lot of distributed computing tasks perform that they produce multiples of the performance per watt of even these power sipping SBCs run in parallel. Yes, some tasks just aren't suited to them. In those cases, the question is, do they manage to produce more compute performance per watt than a power constrained 3990x? On a minimalist board, with as much disabled on it as possible, with the TDP turned down, can a gang of SBCs manage to outproduce it per watt? You can probably buy enough of them with the same money to out produce it, but you'll probably chew through a significant amount more power, not to mention the headache of managing the better part of a hundred nodes in the process.

I think that the next place to look may be lower mid-range phones. Many of them have decently potent CPUs on somewhat modern process technologies. Especially for DC projects that work with the Android version of BOINC, they might be worth looking at. With a reasonable duty cycle set, and an efficient gang charger with short wires and decent ventilation, you might get a surprisingly good return on them.
 

Markfw

CPU Moderator, VC&G Moderator, Elite Member
Super Moderator
May 16, 2002
21,404
9,466
136
That's ridiculous, you'd have to search far and wide to come up with something the S9's CPU does faster than the iPhone's. If you want to make this claim, let's see the numbers that back it up.
Do you have a cochlear implant like me ? Do you use your phone to control it ? There re no benchmarks for this, so you will have to take my word for it. If you don't like that too bad, go kill your hearing with drugs or something and get a cochlear implant. Then talk to me.
 

ASK THE COMMUNITY