The problem with emulators is then you've just created a proxy test for JIT compiler performance. Which is not to say that they aren't useful as an application benchmark, just that it's probably not a good architecture benchmark.
I can't really say a lot about other emulators, but that's not an accurate statement for mine. The benchmarking I have covers time spent 2D, 3D, geometry, audio, and everything else (including CPU) by disabling those subsections. It varies a lot from game to game, but because it uses software rendering it's common for the time spent on 3D to be similar to the time spent on everything else. 2D, 3D, and geometry all use SIMD heavily (at least on ARM) but they also are pretty big testers of other things like various parts of the memory subsystem and at parts, branch prediction. The rendering also scales with core counts, up to 3 threads anyway (or 2 for 2D)
Even the "everything else" part is not entirely spent in translated code. There's time spent managing events (on DS lots of state changes happen during the screen update), switching between the two CPU cores, etc. And unlike programs on PC DS games (and a lot of other console games) spend a significant amount of time accessing peripherals directly through memory mapped registers. Typical numbers can be 5% of reads and 10% of writes. So how this is handled makes a big impact and is also separate from translated code.
If I look at a profile I can generally see several dozen functions in the top 80+% of execution time, and the actual translated code and the store handler it uses generally accounts for < 20% of runtime except in very CPU heavy or pathological games. It's a major part of what the test is about but it doesn't make the whole thing a proxy for it. And "JIT performance" itself isn't really a single thing; part of it will be instrumenting the emulator's translation facilities, but another part of it will be instrumenting the behavior of whatever it's actually emulating, which is highly variable.
On the other hand, I would guess that the Dolphin Povray test currently done on AT is a lot closer to spending most of its time in translated code since I doubt much is going on with 3D or audio there, and the stuff it's running is probably a much simpler/more limited test case than a real game.