Whats more beneficial per core? Shared schedulers or dedicated schedulers? To me, it seems like shared is more flexible as in the following case: If only one pipe is needed for some execution, it has most of the schedulers dedicated to it?
If both pipes are being used, the schedulers are working their hardest in distributing appropriately, but hopefully can mix and match / be more effecient in a shared setup?
If thats the case, couldnt the "not quite" doubling of schedulers not be an issue, since going to shared is more effecitive utilization of them?
This is a design question. Intel CPUs cannot execute integer with SSE or FPU instructions in parallel (per issue port), so a seperate scheduler doesn't make sense. They use a shared schedulers because the execution resources are shared.
AMD CPUs can execute integer and SSE/FPU in parallel and therefore they had always dedicated schedulers. The problem of AMD was that the actual software could seldomly make use of this massive execution parallelism (3*integer + 3*FPU/SSE + 3*AGU + 3*load/store in one cycle). Even more since most compilers optimized for Intel and it is generally difficult to make use of parallel execution power. A lesson also Intel learned with Itanium.
Intel's Core design has 3 * Integer/FPU/SSE + 1 AGU + 2 load/store. But as you know this less units do not affect performance of Intel Core.
You want to ask then why AMD has choosen this design for their old and current products which wastes resources. That was because that simplified the architecture (if you have 3 of everything that simplifies scheduling a lot) and they didn't know at that time that software will not be able to make use of those massive execution resources (esp. AGU and FPU/SSE).
Bulldozer ist the technical solution or if you like "magic trick" to squeeze everything out of the capabilities of their design approach.
It has 2*integer + 2*FPU/SSE + 2*AGU per core (yeah still 2 of everything to make it simple though the sense of a second AGU is questionable). That is much better in economics because less resources are wasted (AGU does not consume too much). High clock design compensates the missing third integer execution unit, while the third FPU/SSE and AGU were just unused in any real software and just consumed die area and power.
All this reduction gave die area to just add another core. That works because of the front-/backend as well is oversized in current designs. So you get another core for free so to say. Regarding overall chip you get double of cores at same price or lets say production cost/die size/power consumption.
That is also why Intel will have to do much more work to get a Sandy Bridge successor to use the same design trick, though that is no real problem for Intel as they have plenty of resources to do that even if it means more work.
All this is much more complex than I have written here, so the above is simplified to point out what has done with Bulldozer and why. E.g. there are some constraints in former AMD CPUs limiting it's scalarity in some cases and there is a design advantage for Intel with their parallel integer instructions as the issue port is shared.
With Core architecture Intel used their design approach to the limits and now bulldozer from AMD will come which will bring their design approach to the limits.
AMD will then have more execution units (4+2 in Bulldozer vs. 3 in Sandy Bridge) running at better utilization (IPC/unit) and at higher clock rates. That are three important improvements besides some other minor ones that will give this radical performance boost.
Intel has likly better branch prediction with Sandy Bridge, some minor core features AMD doesn't provide with Bulldozer and Hyperthreading.
Therefore Bulldozer is not twice as fast as Sandy Bridge but as this rumor suggests something like 50% faster overall. Though the improvement will vary greatly depending on the benchmark used.
And some word regarding this: "AMD 8 core vs. Intel 4 core. Ha I am not impressed." Once again: Both CPUs will show 8 cores to the operating system. It is just different how those 4 extra cores are generated: Intel's virtual cores from Hyperthreading and AMD's real cores from their "split module technology".
And another word about the rumor. This could really get interesting, because the comparison leaked on donanimhaber shows media, render and gaming. What is interesting about that is that none of those should be the real strength of AMD Bulldozer, since it's real strength is in integer power like database, compiler, web server, chess, etc.