Question Here's how dog slow proper x86 CPU emulation is

Jul 27, 2020
16,208
10,262
106

gdansk

Platinum Member
Feb 8, 2011
2,081
2,570
136
And I will add that if you don't care about accuracy as much then QEMU can do it much more quickly. It is why many PC can play Nintendo Switch games at full speed.

Accurate emulation is niche and so to some extent I have to agree that they shouldn't really need Pentium 3 support. But if the fork was for fun and experiment then there's no reason to complain. Maybe Zen 6 will make it feasible.
 
  • Like
Reactions: Tlh97 and podspi

Nothingness

Platinum Member
Jul 3, 2013
2,407
736
136

5950X was the first CPU that could pretend, at a reasonable speed, to be a real Pentium II (!!!).

Let that sink in.

Stumbled onto that page after curiosity led to a google search from this page: https://www.techspot.com/news/102262-ancient-pre-release-version-os2-20-discovered-released.html

Any CPU architecture gurus wanna chime in with their thoughts?
I am not sure how accurate they are, but really accurate models of a complex modern CPU (with pipelines, branch pred, data prefetcher, etc.) run at less than 100,000 instructions per second. I am talking about models that are used to research and evaluate new micro architectural features that might be implemented in the design. Of course one can sacrifice accuracy to increase speed, but that limits the studies that can be done.

I am not sure how far they went with accuracy in 86box (do they simulate pipelines and caches?), but I am not surprised by what they say.
 

mikeymikec

Lifer
May 19, 2011
17,683
9,530
136
It took until about 3GHz C2D or better to emulate even the measly SNES (especially auxiliary chips) in a cycle-accurate way.

IIRC (though I'm not 100% confident in this memory), my old Haswell i5-4690k (if it wasn't that one, it was its predecessor being an AM3 quad-core Athlon II / Phenom II) couldn't 100% handle emulating an Amiga 500 playing SWIV, my 7800X3D does :)

The Amiga 500 sported a 7MHz 68000 CPU and a dedicated GPU.

I wonder if 86Box is the answer to how my wife can properly play Dr Drago's Madcap Chase again. I've tried virtualisation and I think I "kinda" managed it a few times, but also perhaps the game/platform was just buggy back then.
 
  • Love
Reactions: lightmanek

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Low level emulation of x86 isn’t really needed for most applications to function, but yes, folks screaming from the rooftops about how great ARM is don’t understand that:

1) Emulation is slow, even high level emulation.
2) Emulation in the Windows world will always be buggy/incomplete. Example: We still don’t have a 100% accurate/bug free SNES emulator, much less anything newer.
3) Emulation is not power efficient
4) A competitor to x86-64 would need to be twice as efficient (both in terms of absolute performance and perf/watt) in order to run x86-64 software at competitive speeds. We aren’t there (yet?)

If you run Linux, the ARM situation is much better as ARM is well supported via open source. The only real reason emulation is needed is proprietary software (Windows, Office, etc)

Disclosure: I have been on multiple emulation projects.
 

Shivansps

Diamond Member
Sep 11, 2013
3,851
1,518
136
You are sure this is correct? i think some info is out of date, i was able to run a K6-2 450mhz and a Pentium 3 400mhz on my 5800X with 86box. or it was with pcem? i dont remember.

I thought this thread was about ARM emulation of x86
You can run 86box in ARM Linux, they even provide you with the prebuild bins, but this is full x86 emulation regardless of the host system arch(and not only the cpu, you can emulate a gpu as well, like a Voodoo3), i tested it sometime back looking to see if a fast and cheap A53 cpu (Allwinner H618) was able to emulate something like a 386/486 at a reasonable speed. (it did not). Im going to eventually try on the RK3588.

Im not sure if 86box is a good example here, 86box is intended to recreate an old system in software, emulating the original performance and features of those systems, this is not just emulation, QEMU can do this much faster, specially if you have KVM, but it is just not the same thing.

I mean, the XT PC got emulated on a single ESP32 microcontroller (it handles everything in software, even the generation of the video signal using gpios)
 
Last edited:
  • Like
Reactions: lightmanek
Jul 27, 2020
16,208
10,262
106
Disclosure: I have been on multiple emulation projects.
Any comments on the story in the OP? What is the single biggest performance improvement an architecture feature can bring to emulation? Larger L1/L2? Cache latency? Branch prediction accuracy?
 
Jul 27, 2020
16,208
10,262
106
I think the lowest I went was Win98 with VirtualBox.
That might be the issue.


VMWare and VirtualBox are problematic with Windows 95. If you use those, you may need to disable various acceleration features first, and/or install the Windows 95 CPU speed fixes. Windows 95 has many issues on faster machines or VMs, requiring a number of patches in order to operate. Consider emulators like x86Box or PCem instead.

Also: https://ondras.github.io/drago/game/
 

Nothingness

Platinum Member
Jul 3, 2013
2,407
736
136
Low level emulation of x86 isn’t really needed for most applications to function, but yes, folks screaming from the rooftops about how great ARM is don’t understand that:

1) Emulation is slow, even high level emulation.
2) Emulation in the Windows world will always be buggy/incomplete. Example: We still don’t have a 100% accurate/bug free SNES emulator, much less anything newer.
3) Emulation is not power efficient
4) A competitor to x86-64 would need to be twice as efficient (both in terms of absolute performance and perf/watt) in order to run x86-64 software at competitive speeds. We aren’t there (yet?)

If you run Linux, the ARM situation is much better as ARM is well supported via open source. The only real reason emulation is needed is proprietary software (Windows, Office, etc)
Apple has already achieved point 4 with Rosetta 2 which is about 70% of native speed. So no need to be twice as efficient. Though captain obvious surely understands that still is less efficient than running native apps.

Disclosure: I have been on multiple emulation projects.
You obviously have not been on Rosetta 2 project.
 

Doug S

Platinum Member
Feb 8, 2020
2,254
3,488
136
Low level emulation of x86 isn’t really needed for most applications to function, but yes, folks screaming from the rooftops about how great ARM is don’t understand that:

1) Emulation is slow, even high level emulation.
2) Emulation in the Windows world will always be buggy/incomplete. Example: We still don’t have a 100% accurate/bug free SNES emulator, much less anything newer.
3) Emulation is not power efficient
4) A competitor to x86-64 would need to be twice as efficient (both in terms of absolute performance and perf/watt) in order to run x86-64 software at competitive speeds. We aren’t there (yet?)

If you run Linux, the ARM situation is much better as ARM is well supported via open source. The only real reason emulation is needed is proprietary software (Windows, Office, etc)

Disclosure: I have been on multiple emulation projects.

There is a HUGE difference between emulation of x86 for e.g. an ARM Mac to run a legacy x86 Mac binary and 100% emulation of an SNES. In the latter you have to emulate the GPU, I/O etc. and embedded/dedicated gaming devices like that tend to do a lot of crazy stuff and probably little or no documentation other than what has been reverse engineered.

My first computer was an 8 bit Atari, and I can't imagine how difficult that would be to emulate. Sure emulating a 6502 running at 1.7 MHz sounds like the easiest thing in the world, but there was so much more to it. I won't bore people with the details but those who are curious can look it up and see how it generated colors on an NTSC TV, all the programming tricks using horizontal and vertical blanking intervals, the hardware sprites, the goofy sound chip. The worst thing would be the fact that the CPU's execution would be halted for a cycle as each line was sent to the TV to allow the graphics chip to take the memory access slot it would have been able to use (known as "cycle stealing") So not only did you do tricks like turning off the screen during initialization to make it happen more quickly, if you had some code that was cycle counted from the start of a blanking interval (more common in games than you probably think) the emulator would have to account for the exact number of DMA cycles the display stole from the 6502 or it will not be "100% accurate/bug free".

I know nothing about SNES but I'm willing to bet there are all kinds of nasty little details like that buried in it that you have to get right if you want to achieve perfection. Apple doesn't have to do any of that stuff when running an x86 Mac binary in user mode, it just has to substitute ARM instructions for x86 instructions and track the x86 register contents (I know I'm simplifying a bit here) It doesn't care about the difference between the AMD GPU the x86 Mac had and the Apple GPU the M1 Mac had, that's all hidden by the Metal API. So what Rosetta 2 had to do was a walk in the park by comparison to an SNES emulator.

As others have pointed out Rosetta 2 does an amazing job of running x86 Mac binaries, and AFAIK is not "buggy/incomplete" but runs 100% of them and given the uplift in M1 performance versus the x86 Macs being replaced did so at the same/faster speed to boot. The example of the lack of perfect SNES emulators has nothing to do with whether an ARM device can emulate an x86 device perfectly. The only question is performance, and that depends on the starting bar. Apple benefited from Intel's 10nm problems and the fact it was a migration not an additional platform which gave Rosetta 2 a static target that was never getting any faster while the hardware it runs on has.
 

eek2121

Platinum Member
Aug 2, 2005
2,930
4,026
136
Any comments on the story in the OP? What is the single biggest performance improvement an architecture feature can bring to emulation? Larger L1/L2? Cache latency? Branch prediction accuracy?
Short of Ann x86 implementation?
There is a HUGE difference between emulation of x86 for e.g. an ARM Mac to run a legacy x86 Mac binary and 100% emulation of an SNES. In the latter you have to emulate the GPU, I/O etc. and embedded/dedicated gaming devices like that tend to do a lot of crazy stuff and probably little or no documentation other than what has been reverse engineered.

My first computer was an 8 bit Atari, and I can't imagine how difficult that would be to emulate. Sure emulating a 6502 running at 1.7 MHz sounds like the easiest thing in the world, but there was so much more to it. I won't bore people with the details but those who are curious can look it up and see how it generated colors on an NTSC TV, all the programming tricks using horizontal and vertical blanking intervals, the hardware sprites, the goofy sound chip. The worst thing would be the fact that the CPU's execution would be halted for a cycle as each line was sent to the TV to allow the graphics chip to take the memory access slot it would have been able to use (known as "cycle stealing") So not only did you do tricks like turning off the screen during initialization to make it happen more quickly, if you had some code that was cycle counted from the start of a blanking interval (more common in games than you probably think) the emulator would have to account for the exact number of DMA cycles the display stole from the 6502 or it will not be "100% accurate/bug free".

I know nothing about SNES but I'm willing to bet there are all kinds of nasty little details like that buried in it that you have to get right if you want to achieve perfection. Apple doesn't have to do any of that stuff when running an x86 Mac binary in user mode, it just has to substitute ARM instructions for x86 instructions and track the x86 register contents (I know I'm simplifying a bit here) It doesn't care about the difference between the AMD GPU the x86 Mac had and the Apple GPU the M1 Mac had, that's all hidden by the Metal API. So what Rosetta 2 had to do was a walk in the park by comparison to an SNES emulator.

As others have pointed out Rosetta 2 does an amazing job of running x86 Mac binaries, and AFAIK is not "buggy/incomplete" but runs 100% of them and given the uplift in M1 performance versus the x86 Macs being replaced did so at the same/faster speed to boot. The example of the lack of perfect SNES emulators has nothing to do with whether an ARM device can emulate an x86 device perfectly. The only question is performance, and that depends on the starting bar. Apple benefited from Intel's 10nm problems and the fact it was a migration not an additional platform which gave Rosetta 2 a static target that was never getting any faster while the hardware it runs on has.
Rosetta 2 does NOT do low level emulation. Despite that, it STILL has a 30-40% hit to performance and a hit to battery life as well.

Rosetta works as well as it does because Apple controls the ecosystem and the hardware. Nobody controls the PC ecosystem. If you build an Application on the Mac you use an Apple toolchain and run it on an Apple OS on Apple hardware.
 
  • Like
Reactions: Tlh97 and moinmoin

Nothingness

Platinum Member
Jul 3, 2013
2,407
736
136
Any comments on the story in the OP? What is the single biggest performance improvement an architecture feature can bring to emulation? Larger L1/L2? Cache latency? Branch prediction accuracy?
Extra ISA features can help. You can read here what people found Apple added to help Rosetta2.

Arm has also added some instructions to help (look for FEAT_FlagM/FlagM2 extensions).
 
  • Like
Reactions: igor_kavinski

Nothingness

Platinum Member
Jul 3, 2013
2,407
736
136
One not working and the rest being "completeable" certainly doesn't sound like perfect emulation.
Hmm you’re right, I was a bit hasty.

According to some, higan and ares are pixel accurate cf
But in 2021 a part of the SNES was not perfectly emulated; see https://arstechnica.com/gaming/2021/06/how-snes-emulators-got-a-few-pixels-from-complete-perfection/
I wonder if they made progress on that part, but that really looks close to perfect emulation.
 
  • Like
Reactions: MrTeal

SarahKerrigan

Senior member
Oct 12, 2014
361
519
136
I will simply say that defining "proper x86 emulation" as "cycle-accurate" is ridiculous. 99.99% of the uses of real-world x86 emulation/translation have absolutely no benefit from near-cycle-accurate emulation of random 90s microarchitectures and their peripheral ICs.

Obviously it's useful for retrocomputing folks, people wanting to run certain older games, etc - but that isn't where most interest in emulation of x86 is, and I don't think that it's inherently more "proper" than anything else.
 
  • Like
Reactions: bononos