• Guest, The rules for the P & N subforum have been updated to prohibit "ad hominem" or personal attacks against other posters. See the full details in the post "Politics and News Rules & Guidelines."

Question Apple Silicon M1 series thread, including M1 Pro, M1 Max - Geekbench 5 single-core >1700

Page 40 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

lopri

Elite Member
Jul 27, 2002
12,842
325
126
It's a giant problem for all the usual X86 players. I am a PC person but if I were in a situation to spend $2K on a laptop, it would be difficult to justify getting anything else than the new Macbook.
 

software_engineer

Junior Member
Jul 26, 2020
8
11
41
Hmm... There are a few errors in there. For example, the Geekbench scores they provide as Mac mini M1 native are actually Rosetta scores.

This FLAC encoding one I find interesting though. The Rosetta score is way, way faster than the native M1 score. o_O

View attachment 34312
The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.
 

moinmoin

Platinum Member
Jun 1, 2017
2,748
3,618
136
So Rosetta generates better ARM code than humans?
It's not farfetchted that a lot of existing ARM code is pretty barebone with regard to optimizations and that Rosetta is good at translating existing assembly and SIMD to ARM equivalents. To be honest I'm positively impressed that there are so many cases where native is already clearly better than Rosetta for exactly this reason. Now imagine all code getting actually optimized for M1's capability.
 

guidryp

Golden Member
Apr 3, 2006
1,398
1,525
136
The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.
This is a some indication of how these benchmarks might be skewed with heavy optimization for x86, vs unoptimized code for ARM.

I note that Kvazaar is also "written in the C programming language and optimized in Assembly". I would expect a lot effort into hand tuned x86 assembler, vs none on the ARM side.
 

DrMrLordX

Lifer
Apr 27, 2000
17,791
6,787
136
The second sentence refers to the entire chip, aka SoC.
You actually bothered to differentiate between the two? Pff whatever. Next time say "SoC" if that's what you mean . . .

tests by phoronix is more representative.
Glad you pasted that! Though . . .

This FLAC encoding one I find interesting though. The Rosetta score is way, way faster than the native M1 score. o_O
Gonna have to take a minute to parse all that data since Phoronix typically throws a lot of stuff at you and not necessarily in useful context, but it does look like some of the attempts by Phoronix to native compile FOSS for the M1 resulted in a lot of unoptimized code.

Mediocre showing overall ...
I don't necessarily agree. When running software that's been ready from day one (or nearly day one) from vendors optimizing specifically for M1, it looks really good. It's only going to lose some MT benchmarks to some higher-power CPUs that probably won't ever run Big Sur anyway. It has the usual Mac problems but it's hard to ding the M1 for that specifically.
 

ThatBuzzkiller

Golden Member
Nov 14, 2014
1,078
205
106
I don't necessarily agree. When running software that's been ready from day one (or nearly day one) from vendors optimizing specifically for M1, it looks really good. It's only going to lose some MT benchmarks to some higher-power CPUs that probably won't ever run Big Sur anyway. It has the usual Mac problems but it's hard to ding the M1 for that specifically.
The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...

If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...

Apple doesn't like low-level programming and in fact discourages it since they don't want to release documentation behind their CPUs like either AMD or Intel does. AMD and Intel will forever have the edge when they want developers to micro-optimize for their architectures ...
 

guidryp

Golden Member
Apr 3, 2006
1,398
1,525
136
The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...

If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...

Apple doesn't like low-level programming and in fact discourages it since they don't want to release documentation behind their CPUs like either AMD or Intel does. AMD and Intel will forever have the edge when they want developers to micro-optimize for their architectures ...
That is nonsense, start to finish.
 

amrnuke

Golden Member
Apr 24, 2019
1,165
1,730
106
The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...

If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...

Apple doesn't like low-level programming and in fact discourages it since they don't want to release documentation behind their CPUs like either AMD or Intel does. AMD and Intel will forever have the edge when they want developers to micro-optimize for their architectures ...
All 3 paragraphs are wrong. You could go research why, but I fear you won't.
 

DrMrLordX

Lifer
Apr 27, 2000
17,791
6,787
136
If you look at the Rosetta numbers specifically
You can safely ignore most of those. Rosetta 2 serves the same basic purpose that Rosetta did back in the day - as a transition kludge to get M1 buyers through until software vendors compile and optimize with M1 as a target. Not all software will "make it", but you can already get a fair amount of software already, with more to come. Anyone who's serious about selling software on MacOS needs to recompile. It's just that simple.

Try to look more at the M1 results that are native and (unlike the FLAC numbers) outperform the Rosetta 2 results from the same benchmark.
 
  • Like
Reactions: Tlh97

insertcarehere

Senior member
Jan 17, 2013
409
279
136
The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...

If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...
The M1 can't beat devices with discrete graphics gaming under Rosetta so it must be only good for Javascript, am I right?

 

Doug S

Senior member
Feb 8, 2020
768
1,085
96
It translates handtuned and hand vectorized SIMD code into ARM SIMD instructions that are then run on powerful hardware at near native speeds. While actual ARM port is typical of current ARM ports - very little if any optimization.
Rosetta2 doesn't handle AVX, it only goes up to SSE 4.2. If the native ARM code is just compiled it may not be vectorized at all - often you need to arrange the source code in a certain way for the compiler to recognize it can be vectorized. So it is easy to see why static translation of the SSE 4.2 code path could be faster than native ARM code that doesn't use vectorization at all.

This isn't going to be a problem for long, stuff that is popular on the Mac will get optimized vectorized ARM code (or maybe use the GPU, NPU or ISP blocks to go even faster in certain cases)

Phoronix's tests were using various open source software packages popular on Linux that may not be used much at all on the Mac.
 

dmens

Platinum Member
Mar 18, 2005
2,237
819
136
All AMD (or Intel) has to do is wait until they can transition to the latest process as well and they'll be able to automatically undo any of the gains that either Apple or any ARM vendor achieved in their designs ...
LOL why wait? Intel can use that vaunted war chest and pay TSMC for just a few 5nm wafer starts, then presto, problem solved, right?

Oh wait, Intel can fab their trash designs on TSMC and it would still be trash. Garbage in, garbage out. Sorry.
 

StinkyPinky

Diamond Member
Jul 6, 2002
6,588
505
126
Very interested to see if Apple use an upscaled variant of this for their desktops. A 16 core variant with cooling could be a beast.
 

Thala

Golden Member
Nov 12, 2014
1,266
577
136
It's not farfetchted that a lot of existing ARM code is pretty barebone with regard to optimizations and that Rosetta is good at translating existing assembly and SIMD to ARM equivalents. To be honest I'm positively impressed that there are so many cases where native is already clearly better than Rosetta for exactly this reason. Now imagine all code getting actually optimized for M1's capability.
That is unfortunately a bigger problem with Phoronix' test suite. Many packages are heavily hand optimized with x86 assembly and SIMD intrinsics.
 

LightningZ71

Senior member
Mar 10, 2017
915
885
136
Apple chose ARM, fully knowing about the ecosystem surrounding it. I argue that the problem isn't with the phoronix test suite. Instead, I argue that it's a problem for Apple given that that's the state of software today. If they want to fix it, they can bloody well pay programers to fix it for them.
 

Thala

Golden Member
Nov 12, 2014
1,266
577
136
Apple chose ARM, fully knowing about the ecosystem surrounding it. I argue that the problem isn't with the phoronix test suite. Instead, I argue that it's a problem for Apple given that that's the state of software today. If they want to fix it, they can bloody well pay programers to fix it for them.
This was not my point. Of course the larger ARM ecosystem will eventually make sure that these issues are going to be fixed - if it is Apple or anyone else like Amazon or even the open source community does not really matter.
I am for instance looking into improving the Intel Embree library with respect to Arm NEON - it is used by Blender, Maxon Cinema and other 3d applications. If you compile Embree from the official sources there is just C++ code path for ARM available.
 
Last edited:

ricebunny2020

Junior Member
Nov 19, 2020
2
4
36
The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.
Rosetta does not support the translation of AVX, AVX2, and AVX512 instructions.
 

ASK THE COMMUNITY