They are very different beasts. Sometimes they'll be close, sometimes one faster, sometimes the other. Usually it's not as pronounced as 10x, more along the lines of 2-4x, but hey, gotta try to make a point, right?

One reason for not programming straight on the hardware is so they can implement your code how they think best, with the hardware they think is best. I don't know what I'd even want to try to figure out performance differences of all the mobile vendors beginning to support OpenCL.
I have respect for AMD and NVidia GPUs as they are good in their arenas, but NVidia should price their products reasonably lesser. I always suspect that for engineering-class applications where data crunching is dominant than rendering, Nvidia would rule, because their Quadro is generally chosen over ATI's FirePro in that space. But probably 69xx is probably an exception due to their floating point support (ref Wikipedia).
Nvidia artificially crippled double-precision floating point on Fermi Geforces. Commoditization is on its way in, and that's one way they are fighting it for a few more years. For Quadros, back in the day, they had far superior drivers. Recently, it's been momentum from then, and now, they got ECC memory before AMD.
As your comment highlights Nvidia GPUs are like parallel CPUs, It is interesting. Do you have any references/pointer where I can read more about this?
Summaries of the most important difference(s), after scalar v. VLIW:
http://www.realworldtech.com/page.cfm?ArticleID=RWT121410213827&p=8
http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&p=8
The main thing Fermi has going on is integrating ideas from old supercomputer processors, and going for running arbitrary highly-parallel code first, graphics second.
The gap is slowly being closed, isn't it?
To some degree GCN definitely will, but by how much, who knows? Current Radeons still have too simplistic a view of memory, and AMD is planning to bring in GCN to fix that (among other things). But, that won't necessarily mean they'll perform the same.
Moreover, CUDA by NVidia requires special tools, but not OpenCL. Usual Linux/Windows development environment should be sufficient. But I may be wrong.
NVidia supports OpenCL, too. In all cases, if you want the best performance, you'll want the vendor's tools, because OpenCL is still too low to completely free you from worrying about registers and other memory details. If it's an obvious case for GPGPU, though, generic code should still blow away CPUs.
If you plan to try to use the GPU for processing on Linux, I would absolutely go with the Geforce. Both companies' binary drivers are behind the times, but nVidia's works, usually with no more effort than installing a package. AMD's binary drivers are needed to get all the features of their current-gen chips, and while they sometimes effortlessly work, they are just as often a nightmare.