Anyway, what does all this out of order business mean? Is the 7000 series going to be more like an x86 cpu, in which case, wouldn't that be bad? Don't modern gpus use risc architecture?
Well, in the sense that everything designed after risc is risc.
Current AMD GPUs are VLIW, like Itanium, while nVidia uses a SIMD/MIMD system, much like this new one AMD just revealed.
And OoO is IMHO not the correct word for what they do here. Ultrasuperhyperthreading?
Bound to give big improvements to peformance.
That has to benefit not only compute but graphics too right? unused resources suck reguardless of what you useing your card for.
Not really. A single-precision FMA unit is ~20 000 transistors, and the chips they build have over a billion transistors. They could add a couple of thousand extra of them and it would be hardly noticeable in chip size and power.
What costs is getting the data where it needs to be, when it needs to be there. VLIW was used because it's relatively cheap to implement regards to shuffling data around (4 individual register files, as opposed to one large one, and you only move data between them trough the eu's), and while it only helps in limited situations, graphics was one of them.
Now they have to shuffle their warps between the 4 execution units of the CU -- I guarantee you that in the domain of graphics, they are wasting more power and die area on the new design than the old one. It's just that now it's useful for a lot more things. (And also good for more exotic graphics routines -- I believe John Carmack has been itching to be able to program GPU's more directly to implement some cooler rendering algorithms.)
Again sounds like it ll give big improvements to performance.
Again, mostly for complex tasks. There isn't
that much pixel-independent operations or control flow in graphics tasks.
Sounds good, I guess this means cayman was doing "expensive" trps to VRAM for synchronization? So performance found for this as well.
Synchronization was expensive on cayman. Guess how often most graphics engines synchronize anything? A handful of times a frame.
BOOOOOM!.... AMD tessellation performance will skyrocket with new card.
Which is one of those area's where theyve been abit weaker than Nvidia
(atleast in synthetical benchmark programs).
Which is a bit sad. Because in reality, there was no need for this. They were perfectly capable of delivering more than one triangle (for overdraw) per pixel on screen -- any program which wanted more tesselation capability than AMD could provide, was simply wasting it as opposed to using it. Marketing wins yet again.
Basically, all the real improvements I can see are for compute. Ignoring the rops, and unless there is anything they are keeping for themselves, this should do worse for power/area when compared to cayman on the same process node. But it should bring computing parity with NV, and given how much better AMD consistently seems to be on the silicon design side (as opposed to the chip architecture), it's entirely feasible that their chips will be
the stream compute platform for tomorrow.
You might have noticed a little caveat up there in my gloominess for graphics -- the rops are a big deal, and they revealed almost nothing about them. For graphics, if Cayman had twice the rops it would probably match up evenly with GTX580. But AMD couldn't add any more of them, because they are tightly integrated with the memory controllers (4 per controller, and that's it), and they didn't want to make the memory bus any wider because it much increases fixed per-card costs. Perhaps now they either have a more flexible rop architecture, or just double the rops per pipeline? GDDR5 certainly has the bandwidth for it.
Sounds like Haswell to me.
No, it doesn't. At all. If it did, Haswell would be utterly awful. First and foremost, it needs to deliver single-thread performance, and Intel knows that.
Also, the point about "before larrabee" wasn't who started first, it was that this is not a response to anything that Intel has done, because work on this had to start before anything about larrabee was publicly talked about. The basic design for this was likely done more or less immediately after AMD and ATi merged. Designs converge, not because people copy each other, but because given a task, there usually is the best way to do it.