Yes, but it translates up to SSE4.2.Rosetta does not support the translation of AVX, AVX2, and AVX512 instructions.
Yes, but it translates up to SSE4.2.Rosetta does not support the translation of AVX, AVX2, and AVX512 instructions.
Is this also the case for the SSE familiy of SIMD instruction sets? The FLAC codebase also has code paths that make use of SIMD intrinsics for each version of the SSE instruction family. In theory SSE instuctions making use of the 128-bit XMM registers should be translatable to ARM NEON instructions, which also operate on 128-bit sized vectors. However, SSE instructions may not map well onto NEON instructions as NEON is a 3 operand instruction set whilst SSE is a destructive 2 operand instruction set.Rosetta does not support the translation of AVX, AVX2, and AVX512 instructions.
Calling that "unfortunate" and a "bigger problem" sounds like it would be a bad thing for code to be optimized, so also for any M1 optimizations?That is unfortunately a bigger problem with Phoronix' test suite. Many packages are heavily hand optimized with x86 assembly and SIMD intrinsics.
Is this also the case for the SSE familiy of SIMD instruction sets? The FLAC codebase also has code paths that make use of SIMD intrinsics for each version of the SSE instruction family. In theory SSE instuctions making use of the 128-bit XMM registers should be translatable to ARM NEON instructions, which also operate on 128-bit sized vectors. However, SSE instructions may not map well onto NEON instructions as NEON is a 3 operand instruction set whilst SSE is a destructive 2 operand instruction set.
That is nonsense, start to finish.
It's pretty much the truth sadly ...All 3 paragraphs are wrong. You could go research why, but I fear you won't.
There is nothing Apple needs to "fix" here. Phoronix is a poor choice for cross platform benchmarking, and it's not really representative of commercial software that most people will run.Apple chose ARM, fully knowing about the ecosystem surrounding it. I argue that the problem isn't with the phoronix test suite. Instead, I argue that it's a problem for Apple given that that's the state of software today. If they want to fix it, they can bloody well pay programers to fix it for them.
Safari on Windows?Google Chrome is an active app on MacOS and Safari is also a native app on Windows 10.
Not sure if I agree with that. Are you suggesting that there's no SIMD support at all in "production" software?Optimizing assembler, is a hobby activity these days, not how production software is built.
You're right. I was looking at the official Safari screen and it confused me when it compared Safari on MacOS to Windows 10 browsers.Safari on Windows?
I though Apple dropped Windows Safari nearly a decade ago?
Wikipedia says Safari was available for Windows from 2007 until 2012.
Pretty sure you don't need hand tuned Assembler, to use SIMD instructions.Not sure if I agree with that. Are you suggesting that there's no SIMD support at all in "production" software?
If that is the case, then what's the advantage to "hand tuned Assembler"?Pretty sure you don't need hand tuned Assembler, to use SIMD instructions.
The M1 is as fast as the Mac Pro in Webkit compile time (according to one reviewer - even coming close is absurd). It compiles Xcode as fast as a 32-thread 3950X Hackintosh.It's within 20% of the i9+dGPU MBP 16" in Final Cut rendering. Apple get the same or better x86 performance in their Mac mini, MBA, and MBP13 via emulation, much better native performance, for less cost and less power consumption.It's pretty much the truth sadly ...
They're touting mediocre emulation performance on 5nm chips and they don't want to release optimization manuals like either AMD or Intel would because they'd prefer to keep changing architectures rather than have developers optimizing their software for their hardware designs which is inconsistent from generation to generation ...
Rosetta numbers will arguably be closer to the reality than many people think because why even bother optimizing for Apple platforms when they don't want to guarantee compatibility ?
SSE has 8 registers in 32bit mode but 16 in x86-64.enabled by 3 operand ops and having 32 architectural regs instead of 8 SSE has.
Same as it's always been, squeezing out every performance advantage.If that is the case, then what's the advantage to "hand tuned Assembler"?
. . . ah huh.Same as it's always been, squeezing out every performance advantage.
x86-64 has 16 Vector registers. The EVEX prefix that was introduced alongside AVX-512 extends that to 32.Meaning a lot of algorithms have a ton of register spills, moves, saves etc.
When doing static translation like Rosetta does, noone is forcing Apple to do the mapping 1:1, they can simply apply optimizations enabled by 3 operand ops and having 32 architectural regs instead of 8 SSE has.
Hand tuned assembler is very rare. Compared to intrinsics you can hope that you manage to do better register allocation, but even this would be rare these days. Also you have more control over loop unrolling and SW pipelining - which can help.If that is the case, then what's the advantage to "hand tuned Assembler"?
Technically you explicitly have to make sure, that the ARM compiler never gets to see the SSE/AVX code path via preprocessor directives or by separation into different files.The issue is: what is the compiler going to do when it hits AVX or AVX2 code paths? Answer: puke all over the place, or revert to a non-SIMD code path since M1 doesn't support any of x86's SIMD at all.
Good point, i keep forgetting Apple is x64 mode, 16 registers.SSE has 8 registers in 32bit mode but 16 in x86-64.
There is typically near ZERO in commercial software in my Experience. It's rare in the extreme That was the point. If there is any at all, it's kind of an anomaly.There probably wasn't a whole heck of a lot more "hand-tuned assembler" involved in any of those applications than there is in "commercial software packages".
I get what you’re saying but the MacBook Pro and MacBook Air have the same memory options and the same storage options. Ironically, the MacBook has a slightly lower performing SoC in some configurations due to the lower end GPU.What Apple has done is move from differentiate a computer by its processor, but instead only memory and storage, just like phones. People are used to buy phones and it is much easier to explain to the end user why they should by more storage, than getting a +200Mhz processor. Apple doesn't sell CPU's they sell a complete package, so the don't have the need to have a lot different CPU's. They have one for each generation, that is the most powerful they can com up with. Intels naming scheme and different generation of core processors on the mobile market was also getting absurd.
But just like phones Apples prices start at midrange and quickly escalate once you add storage.
They had to find a place for the chips with defectsI get what you’re saying but the MacBook Pro and MacBook Air have the same memory options and the same storage options. Ironically, the MacBook has a slightly lower performing SoC in some configurations due to the lower end GPU.
To be honest, it's more of an ego thing on this forum. A lot of people here have Ryzen systems or are AMD fans. The minute their PC master race systems get blown up by a tiny Macbook Air in common applications that most people use, they start to find ways to boost their ego.I mean they're kind of useful if you want to know how poorly a rushed sloppy port will do on the M1, so I guess it has that going for it.![]()
Thread starter | Similar threads | Forum | Replies | Date |
---|---|---|---|---|
S | Discussion The RK3588 thread | CPUs and Overclocking | 3 | |
H | Review Zen4 3D review thread | CPUs and Overclocking | 461 | |
I | Question Intel Mont thread | CPUs and Overclocking | 52 | |
![]() |
Question CPU Microarchitecture Thread | CPUs and Overclocking | 7 | |
![]() |
Discussion Smartphone chips die area Thread. | CPUs and Overclocking | 2 |