I mean the cpu is design around an isa -like x86,armvx,mips- yet gpus are design around apis. Is it that the concept of the isa and an api are not far separated? If one -or two- architectures didn't dominate would there be many others on the market if the design was similar to the gpu market or would it still end up being a duopoly(amd v. nvidia)? What if instead of x86 or arm they were targeting the opencl api, is this the future??
Kind of neither and both.
CPUs have been designed for quite some time to run C. If you come out with a new CPU, it had better be easy to make C89 compile to it. Now, that means it needs to have some way to support or emulate a flat integer memory space, support or implement a C-compatible data stack and call stack, and so on. A lot more ends up running, but that's pretty much a make or break thing, unless it's going to run a truly custom OS, and only be used for a very small targeted market.
Any CPU that can run a popular OS can also run things like Java, be programmed against POSIX, and so on.
However, internally, CPUs can be very different.
Up to Intel's 386, x86 instructions were petty much directly routed to execution units, with only enough extra logic to make the CPU stall when it needed to wait on something.
The 486, and then Pentium, added features that looked somewhat like modern/RISC pipelines and decoders, but with odd gotchas, and not for all instructions.
AMD's K5, however crappy, was basically a RISC processor, with an x86 translator bolted on. The K6, which wasn't bad at all, was the same way, but designed as a processor just for x86 execution (they actually called the internal ISA, "RISC86").
Intel's Pentium Pro (which begot the PII, PIII, Core, and Core 2) Was done very similarly, though did have some direct x86 execution. It had practically no similarities to its predecessors. In fact, it seems many inside Intel weren't even confident that it could be executed well, much less be as fast the best RISC CPUs at integer workloads, from what I've read.
AMD's Athlon was very different, but done similarly to the Pentium Pro family. It had many similarities with Alpha processors that never came to be (much like Netburst).
The Pentium 4 (Netburst) was also a radical departure. Inside, it was nothing remotely like most other x86 CPUs (but, not necessarily in a good way).
And so on. Each CPU and related hardware took the same machine code, and made sure it got the same results back out of it, but did so in its own way. The instructions used to hit a section that dumbly routed them where they needed to go. Nowadays, there is an internal/external boundary, and the ISA is not unlike a an API, or bytecode spec, defining
what, but not
how.
Of course, way out in left field, Transmeta had some CPUs that actually ran x86 code
in software. Instruction translation, register renaming, and all that, were done by software that the CPU ran. It was like a CPU emulator, with physical peripherals. The entire x86 ISA may as well have been just an API, to their CPUs--or, more accurately, like Java bytecode. While they get some lawsuits, they mostly failed because they were too radical and ambitious for their size, and too ahead of their time for the public to latch on.
Early on, OpenGL was the thing professionals used. Early GPUs were fixed-function OpenGL accelerators. They took parts of the common OpenGL rendering pipeline, and set up dedicated hardware for them. Much of the little things still got done by the CPU.
OpenGL, however, was and still is, a fragmented, design-by-community thing. This gives it strengths and weaknesses. One of the weaknesses is lack of strict hardware requirements. OTOH, that can be a strength for HW makers, as they can decide what and how the HW is implemented, with a level of creativity otherwise not possible.
Microsoft saw the weaknesses being much more important than the strengths, for games, which sold Windows PCs. So, they came up with Direct3D, which included hardware requirements. The first few versions mirrored OpenGL. As time went on, they diverged more. By having those requirements, CPU bottlenecks to performance that didn't matter for professionals could be done away with, so a faster video card would plain be a faster video card, within reason, for a given generation. The hardware may be partly designed for the API, but there are also specific hardware requirements for supporting it, which OpenGL (non-ES) lacks. Older GPUs not supporting newer OpenGL versions tends to be more of a business decision, than a technical one.
Now, also, at the time GPUs were becoming more complicated, JIT and AOT runtime compilers had proven their mettle in the real world (in the 70s and 80s, it was debatable whether they would actually be worthwhile). So, AMD, nVidia, and all the players we've lost, could look at Java, and go, "aha!" They could take sets of calls in between necessary GPU actions, and combine predefined optimized subroutines with a run-time compiler to generate the rest, and have the benefits of directly-compiled code for the GPU, and the benefits of recompiling that code for better performance, or compatibility, with new drivers, or for new hardware, without the program being serviced even having a clue it was happening. Also, shaders heavily rely on such technology, as DX 9 and up shaders are abstract enough they may as well be done in scripting languages.
Fast forward to today. Very few people need or want to code just for a single GPU ISA. Everybody can go build theirs how they please, as long as they get all the right feature checkboxes filled. There are, however, decades-worth of software packages, that expect to be compiled to assembly code for CPUs, and may even
rely on that. If you want to have a chance at running Linux, *BSD, QNX, etc., you need to be able to get and use a C compiler, and be compatible, at a low level, with common peripheral interface specifications.
So, CPUs have powerful translating front-ends, to utilize the same ISA for different back-ends, while GPUs have more powerful drivers and firmware (much cheaper), since they don't have to spend as much of that R&D money on making an existing ISA perform magic tricks. The GPU guys can go modify the ISA, without anyone having to modify anything that 99% of programs using the hardware see (this is also one of the benefits of things like .NET and Java, on CPUs).
In both cases, it's actually more blurry than it first appears, but it's also just history of their usage.