Roland00Address
Platinum Member
- Dec 17, 2008
- 2,173
- 244
- 106
The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.Hmm... There are a few errors in there. For example, the Geekbench scores they provide as Mac mini M1 native are actually Rosetta scores.
This FLAC encoding one I find interesting though. The Rosetta score is way, way faster than the native M1 score.
View attachment 34312
So Rosetta generates better ARM code than humans?The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.
It translates handtuned and hand vectorized SIMD code into ARM SIMD instructions that are then run on powerful hardware at near native speeds. While actual ARM port is typical of current ARM ports - very little if any optimization.So Rosetta generates better ARM code than humans?
It's not farfetchted that a lot of existing ARM code is pretty barebone with regard to optimizations and that Rosetta is good at translating existing assembly and SIMD to ARM equivalents. To be honest I'm positively impressed that there are so many cases where native is already clearly better than Rosetta for exactly this reason. Now imagine all code getting actually optimized for M1's capability.So Rosetta generates better ARM code than humans?
This is a some indication of how these benchmarks might be skewed with heavy optimization for x86, vs unoptimized code for ARM.The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.
You actually bothered to differentiate between the two? Pff whatever. Next time say "SoC" if that's what you mean . . .The second sentence refers to the entire chip, aka SoC.
Glad you pasted that! Though . . .tests by phoronix is more representative.
Gonna have to take a minute to parse all that data since Phoronix typically throws a lot of stuff at you and not necessarily in useful context, but it does look like some of the attempts by Phoronix to native compile FOSS for the M1 resulted in a lot of unoptimized code.This FLAC encoding one I find interesting though. The Rosetta score is way, way faster than the native M1 score.![]()
I don't necessarily agree. When running software that's been ready from day one (or nearly day one) from vendors optimizing specifically for M1, it looks really good. It's only going to lose some MT benchmarks to some higher-power CPUs that probably won't ever run Big Sur anyway. It has the usual Mac problems but it's hard to ding the M1 for that specifically.Mediocre showing overall ...
The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...I don't necessarily agree. When running software that's been ready from day one (or nearly day one) from vendors optimizing specifically for M1, it looks really good. It's only going to lose some MT benchmarks to some higher-power CPUs that probably won't ever run Big Sur anyway. It has the usual Mac problems but it's hard to ding the M1 for that specifically.
That is nonsense, start to finish.The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...
If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...
Apple doesn't like low-level programming and in fact discourages it since they don't want to release documentation behind their CPUs like either AMD or Intel does. AMD and Intel will forever have the edge when they want developers to micro-optimize for their architectures ...
All 3 paragraphs are wrong. You could go research why, but I fear you won't.The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...
If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...
Apple doesn't like low-level programming and in fact discourages it since they don't want to release documentation behind their CPUs like either AMD or Intel does. AMD and Intel will forever have the edge when they want developers to micro-optimize for their architectures ...
You can safely ignore most of those. Rosetta 2 serves the same basic purpose that Rosetta did back in the day - as a transition kludge to get M1 buyers through until software vendors compile and optimize with M1 as a target. Not all software will "make it", but you can already get a fair amount of software already, with more to come. Anyone who's serious about selling software on MacOS needs to recompile. It's just that simple.If you look at the Rosetta numbers specifically
The M1 can't beat devices with discrete graphics gaming under Rosetta so it must be only good for Javascript, am I right?The M1 basically boils down to high quality Java/browser performance but that's not a surprise since previous Apple designed ICs were already good at those benchmarks and then some ...
If you look at the Rosetta numbers specifically, the M1 is mediocre given all it's circumstances. Apple just wants to keep paying the emulation or high level abstraction tax ...
Rosetta2 doesn't handle AVX, it only goes up to SSE 4.2. If the native ARM code is just compiled it may not be vectorized at all - often you need to arrange the source code in a certain way for the compiler to recognize it can be vectorized. So it is easy to see why static translation of the SSE 4.2 code path could be faster than native ARM code that doesn't use vectorization at all.It translates handtuned and hand vectorized SIMD code into ARM SIMD instructions that are then run on powerful hardware at near native speeds. While actual ARM port is typical of current ARM ports - very little if any optimization.
LOL why wait? Intel can use that vaunted war chest and pay TSMC for just a few 5nm wafer starts, then presto, problem solved, right?All AMD (or Intel) has to do is wait until they can transition to the latest process as well and they'll be able to automatically undo any of the gains that either Apple or any ARM vendor achieved in their designs ...
That is unfortunately a bigger problem with Phoronix' test suite. Many packages are heavily hand optimized with x86 assembly and SIMD intrinsics.It's not farfetchted that a lot of existing ARM code is pretty barebone with regard to optimizations and that Rosetta is good at translating existing assembly and SIMD to ARM equivalents. To be honest I'm positively impressed that there are so many cases where native is already clearly better than Rosetta for exactly this reason. Now imagine all code getting actually optimized for M1's capability.
This was not my point. Of course the larger ARM ecosystem will eventually make sure that these issues are going to be fixed - if it is Apple or anyone else like Amazon or even the open source community does not really matter.Apple chose ARM, fully knowing about the ecosystem surrounding it. I argue that the problem isn't with the phoronix test suite. Instead, I argue that it's a problem for Apple given that that's the state of software today. If they want to fix it, they can bloody well pay programers to fix it for them.
Google Chrome is an active app on MacOS and Safari is also a native app on Windows 10.thers only google chrome for windows 10?
Rosetta does not support the translation of AVX, AVX2, and AVX512 instructions.The x86 build of FLAC seems to make use of x86 SIMD intrinsics in addition to x86 assembly. I don't see any evidence of any use of ARM SIMD intrinsics or of ARM assembly in the FLAC codebase, so that is likely to explain the performance disparity between the native ARM build of FLAC and the x86 build of FLAC run via Rosetta.