Review Raspberry PI (ARM A72) vs EPYC for DC study. Interesting results.

Markfw · Sep 25, 2020

OK, now this is a very specific test, but as much as they are disparate, I don't think other scenarios would be that different.

So I got an 8 gig Raspberry PI (8 gig, 32 gig HD, A72 v8 4 core 1.5 ghz). Its does the Open Pandenics covid 19 WCG unit, 4 cores times 7.5 hours

My EPYC 7742 does 128 of these same units at 2 ghz in 2.5 hours.

So it would take 32 PI's to do the same work, but 3 times slower. An 32 PIs at 5 watts each is 160 watts, vs the 7742 @250 watts. (motherboard, memory and all) So more power and 3 times longer for the same work is 2 times the power usage. (1200 watts total vs 625)

Cost... 32 PIs at about $60 each (including power supplies and cables, probably more) would be at least $1920.

The EPYC is about $4000+480+580 or $5060. (I paid $3000 less, that retail on ebay)

So the EPYC effectively less because it cost 2 times less electricity for the same work.

This seems WAY different than I am seeing when people talk about ARM and efficiency and power usage. Please tell me where I am mistaken in my math., But don't be a jerk if you find the errors of my ways. The run times are real.. The PI is at 7.5 hours and hasn't actually finished a unit yes, no other tasks running on the PI. And I am looking at several units on the 7742, one is at 97% in 2:21.

Edit: The first unit finished in 7:50, almost 8 hours. The next 2 units are at 8 hours and still running at 96 and 98%.

And my comment on "wins by a large margin" has to do with total power usage, which in data centers, and for me is a big deal.

Thala · Sep 25, 2020

Markfw said:
octal-core NVIDIA Carmel ARMv8.2 CPU

For the ARM gurus, before I make a fool of myself, how recent is this core ? and on what process node ? It looks like 12nm. First tested Sept 2018, so should be more recent than Rome.

But how fast is it compared to some you have been mentioning here ?

@Mark
Carmel is on par with Cortex A75 give or take.
I already did point out, that if you want the most recent ARM core (a Cortex A76, mind you?), which can run Linux (as part of WSL) out of the Box, it would be the Surface Pro X. All newer Cortex A CPUs are only available in Phones as of today.

Also of note is, that ARMv8.x refers to the ISA revision, not necessarily to a particular implementation. A higher ISA revision typically means, later versions supporting more instructions.
For example A76, A77, A78 and AX are all ARMv8.3 compliant.

blckgrffn · Sep 25, 2020

Markfw said:
All of those are OOS BTW

I was just saying that although they tout themselves as upgrades from a Pi the differences in terms of what you are looking for are minimal at best, given they use the same CPU architecture and more advanced GPU.

So these affordable alternatives would just be a waste of your time.

Thala · Sep 25, 2020

Markfw said:
Rome is 2-3 years old. So I thought I was comparing recent to recent tech.

Quick reminder - Naples was released in 2017 but Rome in 2019...

Markfw · Sep 25, 2020

Thala said:
Quick reminder - Naples was released in 2017 but Rome in 2019...

It was in user testing later in 2018. Not sure what you call is, but using ES chips.. Yes, Auf 7th 2019 was the official release.

As for the chips, I am not spending over $1000 to test these under a surface pro x., so how close would that Xavier NX be to a 2019 release arm ?

A/// · Sep 25, 2020

Markfw said:
It was in user testing later in 2018. Not sure what you call is, but using ES chips.. Yes, Auf 7th 2019 was the official release.

As for the chips, I am not spending over $1000 to test these under a surface pro x., so how close would that Xavier NX be to a 2019 release arm ?

Correct! Milan was in ES testing around a year agoish and we only got solid info this past July thanks to someone digging into Red Hat's bug tracker for QEMU.

Markfw · Sep 25, 2020

A/// said:
Correct! Milan was in ES testing around a year agoish and we only got solid info this past July thanks to someone digging into Red Hat's bug tracker for QEMU.

So time wise (not exactly node wise) what is a comparable ARM chip to compare to Rome ? I want to do this if possible. I found a Xavier NX for $387 on Amazon, I could do that.

Thala · Sep 25, 2020

Markfw said:
It was in user testing later in 2018. Not sure what you call is, but using ES chips.. Yes, Auf 7th 2019 was the official release.

As for the chips, I am not spending over $1000 to test these under a surface pro x., so how close would that Xavier NX be to a 2019 release arm ?

Ok talking floating point IPC here:

A57->A72 25% (2015) - Raspberry Pi 4
A72->A73 5% (2016) - HP Envy X2
A73->A75 30% (2017) - Xavier NX
A75->A76 30-35% (2018) - Surface Pro X
A76->A77 30-35% (2019) - Samsung Galaxy S20
A77->A78 7% (2020)
A78->AX1 22% (2020)

So a Cortex A78 would be roughly twice as fast as an A75 clock per clock, an Cortex AX1 would be even faster.

Regarding power efficiency, part is coming from node, the other from architecture. For instance the Cortex A78 achieves the 7% higher performance while using 4% less power at same node and frequency compared to Cortex A77.

Markfw · Sep 25, 2020

Thala said:
Ok talking floating point IPC here:

A73->A75 30% (2017) - Xavier NX
A75->A76 30-35% (2018) - Surface Pro X
A76->A77 30-35% (2019) - Samsung Galaxy S20
A77->A78 7% (2020)
A78->AX1 22% (2020)

So a Cortex A78 would be roughly twice as fast as an A75 per clock cycle, an Cortex AX1 would be even faster.

Regarding power efficiency, part is coming from node, the other from architecture. For instance the Cortex A78 achieves the 7% higher performance while using 4% less power at same node and frequency compared to Cortex A77.

So, not a good time for me to try and invest in ARM, since I can;t buy any of the new ones, and I have an iphone11 ,but don't want it dying all the time (even if it could run BOINC). A surface x would be more expensive per total power than a Rome.

I guess I need to stay with EPYC for now. Any other good ideas ?

LightningZ71 · Sep 26, 2020

I wonder, can a surface pro x even be loaded with Linux?

sdifox · Sep 26, 2020

Nintendo Switch is a Tegra X1 Quad A73+Quad A53. You can run linux on there. Please note that there is a new version incoming.

FCC Filing Hints at New Switch SOC, Memory Components for Switch Dev Kit - Rumor

www.nintendoworldreport.com

Every Nintendo Switch can run Linux and homebrew following release of unpatchable exploit

Since the hack is at the ROM level, it's impossible for Nintendo to completely patch the exploit.

mobilesyrup.com

NTMBK · Sep 26, 2020

sdifox said:
Nintendo Switch is a Tegra X1 Quad A73+Quad A53. You can run linux on there. Please note that there is a new version incoming.

FCC Filing Hints at New Switch SOC, Memory Components for Switch Dev Kit - Rumor

www.nintendoworldreport.com

Every Nintendo Switch can run Linux and homebrew following release of unpatchable exploit

Since the hack is at the ROM level, it's impossible for Nintendo to completely patch the exploit.

mobilesyrup.com

Switch uses A57 cores, not A73.

sdifox · Sep 26, 2020

[/QUOTE]

NTMBK said:
Switch uses A57 cores, not A73.

Doh! Mea Culpa.

Op do you know someone in university? Nvidia does provide education discount for their dev kit. Right now it is only showing TX2 but I imagine soon the Xavier NX will show up.

Jetson for AI Education

developer.nvidia.com

eek2121 · Sep 26, 2020

JoeRambo said:
A72 is a design from ~2015 and not on leading edge process like AMD either. So its like comparing fresh Apples to outdated fried fruit.

Make no mistake, those ARM monolithic monsters will steal business from AMD/Intel. Once per core performance is decent enough, it is game over.

I try to avoid threads like this because they are full of ARM fanboys. However I will say this:

It’ll never happen. People assume x86 is some large, inefficient architecture. That is completely false. x86 is designed for high performance with moderate power consumption. ARM is designed for low-moderate performance at low power consumption. Some ARM chips sacrifice power consumption for performance (Graviton2, Apple A13), and some x86 CPUs sacrifice performance for low power consumption.

When it comes to scalable, raw perf/watt I’ve yet to see a single ARM CPU even come close. I cannot buy a 128 core/256 thread ARM server to put in my datacenter. Period. I cannot match x86 compute density with anything ARM related. Period.

eek2121 · Sep 26, 2020

Thala said:
Ok talking floating point IPC here:

A57->A72 25% (2015) - Raspberry Pi 4
A72->A73 5% (2016) - HP Envy X2
A73->A75 30% (2017) - Xavier NX
A75->A76 30-35% (2018) - Surface Pro X
A76->A77 30-35% (2019) - Samsung Galaxy S20
A77->A78 7% (2020)
A78->AX1 22% (2020)

So a Cortex A78 would be roughly twice as fast as an A75 clock per clock, an Cortex AX1 would be even faster.

Regarding power efficiency, part is coming from node, the other from architecture. For instance the Cortex A78 achieves the 7% higher performance while using 4% less power at same node and frequency compared to Cortex A77.

To catch up with Milan, ARM parts will need to be twice as fast as the A77 while consuming orders of magnitude less power. This is not hyperbole.

The A77 trails the Apple A13 and most current high end x86 chips by 30-40% in single threaded performance. The Apple A13 has 2 big cores that can match last year’s x86 in performance, but they run too hot and consume too much power to scale up. The A14 is expected to only be 5% faster in ST performance than the A13 and will be significantly slower than this year’s x86 chips (Zen 3, Tiger Lake, etc.) despite having a huge node advantage.

Doug S · Sep 26, 2020

eek2121 said:
To catch up with Milan, ARM parts will need to be twice as fast as the A77 while consuming orders of magnitude less power. This is not hyperbole.

The A77 trails the Apple A13 and most current high end x86 chips by 30-40% in single threaded performance. The Apple A13 has 2 big cores that can match last year’s x86 in performance, but they run too hot and consume too much power to scale up. The A14 is expected to only be 5% faster in ST performance than the A13 and will be significantly slower than this year’s x86 chips (Zen 3, Tiger Lake, etc.) despite having a huge node advantage.

Who says Apple's cores "consume too much power to scale up"? They consume a lot less power than x86 laptop cores - let alone desktop/server cores. They may consume more power than ARM designed cores like the A77, but Apple only puts two of them in a phone instead of four like Qualcomm so they can afford a little extra power.

And who says A14 will only be 5% faster in ST performance? Apple's claim was that A14 is 40% faster than A12, which would imply a 16% gain over A13.

Sounds like a lot of wishful thinking from an x86 fanboy. You better watch out, Apple's Macs (which will undoubtedly be faster than what they put in the phone) might burst your bubble.

Markfw · Sep 26, 2020

Doug S said:
Who says Apple's cores "consume too much power to scale up"? They consume a lot less power than x86 laptop cores - let alone desktop/server cores. They may consume more power than ARM designed cores like the A77, but Apple only puts two of them in a phone instead of four like Qualcomm so they can afford a little extra power.

And who says A14 will only be 5% faster in ST performance? Apple's claim was that A14 is 40% faster than A12, which would imply a 16% gain over A13.

Sounds like a lot of wishful thinking from an x86 fanboy. You better watch out, Apple's Macs (which will undoubtedly be faster than what they put in the phone) might burst your bubble.

Well, I am certainly not impressed withthe A13 in my iphone 11. Slower than my Galaxy 9 !

videogames101 · Sep 26, 2020

Hi Markfw, there is a lot of speculation *rim-shot* going on in this thread. To try answer your question "Please tell me where I am mistaken in my math" directly, you are not mistaken. Your EPYC 7742 is a great part. I think main disconnect here is that you are comparing against the Arm A72, which is 5 years old. It might not sound like much time, but it's like going from Bulldozer to Zen. The performance increases you'd see by moving from A72 to A76, A77, A78, X1, or any of the modern Apple Ax cores cannot be understated. Especially on a modern 7nm node. You would find the results significantly closer, see this chart from our friend Andrei on ST performance/power consumption:

https://pbs.twimg.com/media/EiXWBJWXcAAeVu3?format=png&name=900x900

So here's the kicker. These more modern Arm cores are not available anywhere except in high-margin devices like mid/high-tier Smartphones, expensive laptops or tablets, etc. There is no Raspberry Pi4 cost device with an A76-class core. There is not even a server box available with this class of core yet (Although I'm watching Ampere Computing closely!) This means all the hype you are seeing about Arm's power efficiency is not available for someone trying to run DC workloads cost effectively.

This is less about the Arm technology itself, and more about commercials. Taping out a modern core out at 7nm is so expensive that no one has yet decided to target the low-margin tech enthusiast consumer market. It makes sense to me given that an ecosystem for compatible motherboards, etc. doesn't exist, and Microsoft doesn't even make Windows on Arm available for purchase. You'd be making a product for enthusiasts running Linux. Not exactly a high-revenue play. So unfortunately, I don't see this changing anytime soon.

If you are interested, you could try running your work unit on an AWS M6g instance. This would give you a more modern Arm core to benchmark on. Although you would not have access to power consumption numbers.

Markfw · Sep 26, 2020

videogames101 said:
Hi Markfw, there is a lot of speculation *rim-shot* going on in this thread. To try answer your question "Please tell me where I am mistaken in my math" directly, you are not mistaken. Your EPYC 7742 is a great part. I think main disconnect here is that you are comparing against the Arm A72, which is 5 years old. It might not sound like much time, but it's like going from Bulldozer to Zen. The performance increases you'd see by moving from A72 to A76, A77, A78, X1, or any of the modern Apple Ax cores cannot be understated. Especially on a modern 7nm node. You would find the results significantly closer, see this chart from our friend Andrei on ST performance/power consumption:

https://pbs.twimg.com/media/EiXWBJWXcAAeVu3?format=png&name=900x900

So here's the kicker. These more modern Arm cores are not available anywhere except in high-margin devices like mid/high-tier Smartphones, expensive laptops or tablets, etc. There is no Raspberry Pi4 cost device with an A76-class core. There is not even a server box available with this class of core yet (Although I'm watching Ampere Computing closely!) This means all the hype you are seeing about Arm's power efficiency is not available for someone trying to run DC workloads cost effectively.

This is less about the Arm technology itself, and more about commercials. Taping out a modern core out at 7nm is so expensive that no one has yet decided to target the low-margin tech enthusiast consumer market. It makes sense to me given that an ecosystem for compatible motherboards, etc. doesn't exist, and Microsoft doesn't even make Windows on Arm available for purchase. You'd be making a product for enthusiasts running Linux. Not exactly a high-revenue play. So unfortunately, I don't see this changing anytime soon.

If you are interested, you could try running your work unit on an AWS M6g instance. This would give you a more modern Arm core to benchmark on. Although you would not have access to power consumption numbers.

Thanks for all that, a very civil and nicely worded response. Yes, I did not recognize that ARM would be only good for the extremely high end devices, with all the hype going on about it. Your explanation makes it clear why. I have decided to abandon ARM until it becomes more "consumer friendly". Even EPYC would seem not a good way to go, but I have found a few ebay deals that make it worth the investment. a 32 core EPYC for $379 ? Yup... But thats Naples. I am go to try and only get Rome from now on.

JoeRambo · Sep 26, 2020

videogames101 said:
ou'd be making a product for enthusiasts running Linux. Not exactly a high-revenue play. So unfortunately, I don't see this changing anytime soon.

It is changing. There is a ton of money and effort being pumped into ARM ecosystem. From gcc/llvm compilers to app porting to system level software like JVM's and Linux distros, there is mindshare growing each day.
What is currently missing, is "desktop" development machine for ARM and Apple is going to solve this not only for their chips, but for others too.

Without doubt cloud and hyper scalers will migrate first. The rest will take longer and won't be complete.

On business scale, what is happening is what Intel did to RISC competition in 1990s, cheap, high volume customer chips enabled pouring money into server designs that rode on process development fueled by said customer chips. It had nothing to do with architecture, Intel only had to be "good enough".
Graviton2 is not good enough, Graviton3 might be. Having good enough CPU without paying AMD/Intel margins will give huge advantages in TCO and push competition towards ARM as well.

AMD is a bit of wildcard here, cause they are riding the same ARM market fueled process advantage wave, but IMO they are too small to matter in long run.

Systems analyst · Sep 26, 2020

JoeRambo said:
It is changing. There is a ton of money and effort being pumped into ARM ecosystem. From gcc/llvm compilers to app porting to system level software like JVM's and Linux distros, there is mindshare growing each day.
What is currently missing, is "desktop" development machine for ARM and Apple is going to solve this not only for their chips, but for others too.

Without doubt cloud and hyper scalers will migrate first. The rest will take longer and won't be complete.

On business scale, what is happening is what Intel did to RISC competition in 1990s, cheap, high volume customer chips enabled pouring money into server designs that rode on process development fueled by said customer chips. It had nothing to do with architecture, Intel only had to be "good enough".
Graviton2 is not good enough, Graviton3 might be. Having good enough CPU without paying AMD/Intel margins will give huge advantages in TCO and push competition towards ARM as well.

AMD is a bit of wildcard here, cause they are riding the same ARM market fueled process advantage wave, but IMO they are too small to matter in long run.

I agree. When writing a new application to run on an SVE machine, I would want to experiment on a desktop first, to 'shape' the algorithm's design to take best advantage of SVE. There is no substitute for practice. It can then be ported to the cloud, for further tests.

AS for Graviton2, it is useful already for various use cases; ARM themselves use it for some use cases, but not all, when testing new designs.
Graviton3 will cover more use cases.

moinmoin · Sep 26, 2020

@JoeRambo Another wildcard is how the ecosystem takes Nvidia's ownership of Arm, should the deal go through. While I agree x86 should have a hard time again Arm in the long run, Nvidia has plenty chances to mess that up once they are in control.

Thala · Sep 26, 2020

LightningZ71 said:
I wonder, can a surface pro x even be loaded with Linux?

Well it comes with WSL/WSL2 out of the box including a Microsoft Linux Kernel - which lets you install several Linux distributions.

Doug S · Sep 26, 2020

Markfw said:
Well, I am certainly not impressed withthe A13 in my iphone 11. Slower than my Galaxy 9 !

That's ridiculous, you'd have to search far and wide to come up with something the S9's CPU does faster than the iPhone's. If you want to make this claim, let's see the numbers that back it up.

LightningZ71 · Sep 26, 2020

I've been doing some more digging on the SBCs and there's a common thread with most of them. They have rather limited memory and storage systems. Only one or two of them have anything faster than eMMC or USB attached storage systems. Most of them have very anemic ram setups as well.

The only SBCs that seem to take memory and storage seriously are the Nvidia Xavier models. Unfortunately, the roughly A75 ish cores on the Xavier's aren't especially potent in raw compute, though, with code and libraries that are specifically targeted at the advanced features that they do have, they are decent for their generation. What they do have, though, is a very impressive cache and memory subsystem. If the code being run on them is memory and I/O intensive, they should blow the figurative doors off if any of the other SBC computers. So, it is going to be VERY situational with them on how good they can be.

The other thing to keep in mind is that, with the Xavier's, if you are just running things on their ARM cores, you're really wasting the rest of their resources. They have extensive GPU and ML capabilities. They are volta level CUDA compliant, so you would have your best results on producing usable work from them if you were to also harness those resources as well. Perhaps running one GPU compute task on the GPU section and five CPU tasks on the ARM cores would give you the absolute best results.

In addition to the above hardware situation, you have issues with respect to kernel and module software support across the whole ecosystem of SBCs. The iGPU sections tend to be very poorly supported under linux, and there are precious few projects out there that can even use them in their current form. This issue makes the Xavier's stand out even more. Nvidia and the community is providing a considerable amount of software and driver support for them. That extensive GPU section is quite accessible.

I close with this: none of the SBCs are physically capable of offering enough raw performance on the cpu side alone to be able to even approach the throughput of even mid level dedicated GPUs. They are so heavily optimized for the types of embarrassingly parallel functions that a lot of distributed computing tasks perform that they produce multiples of the performance per watt of even these power sipping SBCs run in parallel. Yes, some tasks just aren't suited to them. In those cases, the question is, do they manage to produce more compute performance per watt than a power constrained 3990x? On a minimalist board, with as much disabled on it as possible, with the TDP turned down, can a gang of SBCs manage to outproduce it per watt? You can probably buy enough of them with the same money to out produce it, but you'll probably chew through a significant amount more power, not to mention the headache of managing the better part of a hundred nodes in the process.

I think that the next place to look may be lower mid-range phones. Many of them have decently potent CPUs on somewhat modern process technologies. Especially for DC projects that work with the Android version of BOINC, they might be worth looking at. With a reasonable duty cycle set, and an efficient gang charger with short wires and decent ventilation, you might get a surprisingly good return on them.

Markfw · Sep 26, 2020

Doug S said:
That's ridiculous, you'd have to search far and wide to come up with something the S9's CPU does faster than the iPhone's. If you want to make this claim, let's see the numbers that back it up.

Do you have a cochlear implant like me ? Do you use your phone to control it ? There re no benchmarks for this, so you will have to take my word for it. If you don't like that too bad, go kill your hearing with drugs or something and get a cochlear implant. Then talk to me.

Review Raspberry PI (ARM A72) vs EPYC for DC study. Interesting results.

Moderator Emeritus, Elite Member

Golden Member

Diamond Member

Golden Member

Moderator Emeritus, Elite Member

Diamond Member

Moderator Emeritus, Elite Member

Golden Member

Moderator Emeritus, Elite Member

Platinum Member

No Lifer

Lifer

No Lifer

Diamond Member

Diamond Member

Diamond Member

Moderator Emeritus, Elite Member

Diamond Member

Moderator Emeritus, Elite Member

Golden Member

Member

Diamond Member

Golden Member

Diamond Member

Platinum Member

Moderator Emeritus, Elite Member