Yes, ask an x86 to emulate an arm core (all emulation suffers a performance hit) and it will happily abide. Ask an arm core to emulate an x86 core and you better put a pot of coffee on.
Not if we're talking about two cores that have similar performance levels like Atom and Cortex-A9.
x86 doesn't have emulation magic built into its ISA. Chips that are locked out of x86-64 like Medfield are at a distinct disadvantage in emulating chips that have more registers. On the flip-side, it means 32-bit x86 will be an available (if not the only) Android NDK target for some time to come and that eases emulation going the other way.
Sure, there's x86 stuff that takes multiple instructions to emulate on ARM. load-op and RMWs, 8 and 16-bit operations (although you will typically not have to keep them coherent with their larger registers after every instruction, if ever at all), large immediates, push/pop, call/ret, SSE stuff that doesn't fit, etc. But the same goes for x86 - three address arithmetic, folded shifts, address adjust, block memory, predication, NEON stuff that doesn't fit, etc. Someone would have to do very optimized translators to get a good feel for which one can do better; it's probably not an easy question to answer before that.
I do know that ARM emulation on x86 suffers because all ALU instructions squash flags, meaning you will often need relatively expensive flags save/restore or other strategies to get around it. It's to the extent that the researchers writing the translation software wanted a flags nullifying instruction prefix added to Atom, but no such instruction was added. ARM doesn't have the same kind of problem because it can already nullify flags, and x86's more eager flags destruction means flags are more likely to be dead and less work to emulate.
I also know that if you tried to do this right now, where the ARM emulation is running on the best available Atoms like Clover Trail, and the x86 emulation on top ARMs (Swift, Krait, A15) the Atom is going to have a big disadvantage by being in-order. Statically re-scheduling code in translation is hard to do, especially when you have too few registers.