AMD gave details on its next GPU architecture at AFDS, goes Scalar and Out of Order.

Nemesis 1 · Jun 17, 2011

Lonyo said:
Because they have been planning this for longer than Nights Ferry has been in the works and they couldn't exactly make significant changes this late in the game.

I not going to debate you on this . But your wrong about who was planning what and when . Larrabee was choosen over 2 other designs back in 2005 . But the research started befor 05. I can get the link but its useless to link anything around here.

Nemesis 1 · Jun 17, 2011

Lonyo said:
Because they have been planning this for longer than Nights Ferry has been in the works and they couldn't exactly make significant changes this late in the game.

I already replied to this but yet another missing post . I guess someone didn't like it . But your wrong about who planed what. I have alot of research on another PC. Its easy to find what ever I want . I won't link but I will give you the keys to find the truth as to whay AMD had to buy ATI .

Kevet and Keifer were a mini-core and a CPU made of 32 of those cores respectively aimed at server workloads. It was four times what Niagara was reaching for, but also five years later. Intel is going for the swarm of CPUs on a slab approach to high performance CPUs, and more importantly, is going to upgrade the chips on a much swifter cycle than we've been used to.( from Inquirer)

Nemesis 1 · Jun 17, 2011

Suddenely the missing post appears

Nemesis 1 · Jun 17, 2011

Ryan Smith said:
And either is Graphics Core Next. The ability to reorder and preempt threads (or rather wavefronts in this case) is not the same thing as reordering instructions within a wavefront. This is no more out of order than an Atom is.

We'll have more tomorrow; Anand & I are still working on our article.

Try to include more than NV . Nights ferry and nights corner belong in this along with AVX vex prefix and Lrbn If ya really want to get into ya have to look at Niagara Sparc and were that tech comes from which will lead you to the key Elbrus. and after Intel bought elbrus(2004) after the iron curtain rusted away.Sun had to redo the lincenes agreement with intel. This all occurred befor the cold war was over and the iron curtain was in place which put Sun in a world of hurt.

-Slacker- · Jun 17, 2011

You talk about missing posts, but all I see is a quadruple post.

Anyway, what does all this out of order business mean? Is the 7000 series going to be more like an x86 cpu, in which case, wouldn't that be bad? Don't modern gpus use risc architecture?

Ryan Smith · Jun 17, 2011

Ryan Smith said:
And either is Graphics Core Next. The ability to reorder and preempt threads (or rather wavefronts in this case) is not the same thing as reordering instructions within a wavefront. This is no more out of order than an Atom is.

We'll have more tomorrow; Anand & I are still working on our article.

It took a bit longer than expected to hammer out, but it's done now. We've glossed over some highly technical details (cache organization, certain latencies, etc), but hopefully this gives you guys a good idea of what GCN is capable of as a compute architecture. For graphics you'll have to wait a bit longer until AMD is ready to launch the first product.

http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute

SirPauly · Jun 17, 2011

Nice article, thanks!

Kenmitch · Jun 17, 2011

Ryan Smith said:
It took a bit longer than expected to hammer out, but it's done now. We've glossed over some highly technical details (cache organization, certain latencies, etc), but hopefully this gives you guys a good idea of what GCN is capable of as a compute architecture. For graphics you'll have to wait a bit longer until AMD is ready to launch the first product.

http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute

Thanks! Interesting article

Elfear · Jun 17, 2011

I feel a little let down with the move to SIMDs. Admittingly, I don't understand all the ins and outs of gpu architecture but it seems AMD is making a compromise like Fermi between compute and graphics capabilities. I understand AMD feels like this will be the best move for future profitability, but as a consumer who buys a graphics card almost exclusively for gaming, I don't want to see massive dies for similar performance (compared to VLIW). That just tells me I'm going to have to pay extra money for features I'll never use. Hopefully AMD will prove me wrong though.

badb0y · Jun 17, 2011

This move actually makes sense for AMD, they are integrating GPUs into their CPUs so moving toward a design that is focused on compute will supplement their CPU's.

gorobei · Jun 17, 2011

the move to simd is fine, vliw only worked well when the gpu was only processing x/y/z vector data on dx7/8/9. Eventually dx11 and up will need new and different architecture to get the increases in performance we are used to.

Also if this results in synergy with apu's then you could start to see some real gains in physics simulations, realtime radiosity lighting, and overall graphics.

Rezist · Jun 17, 2011

Elfear said:
I feel a little let down with the move to SIMDs. Admittingly, I don't understand all the ins and outs of gpu architecture but it seems AMD is making a compromise like Fermi between compute and graphics capabilities. I understand AMD feels like this will be the best move for future profitability, but as a consumer who buys a graphics card almost exclusively for gaming, I don't want to see massive dies for similar performance (compared to VLIW). That just tells me I'm going to have to pay extra money for features I'll never use. Hopefully AMD will prove me wrong though.

I agree with this post, I guess if the graphics work load changes this will be good change but currently it seems amd enjoys the better performance/watt thanks to the VLIW architecture.

Nemesis 1 · Jun 17, 2011

Kenmitch said:
Thanks! Interesting article

Sounds like Haswell to me.

podspi · Jun 17, 2011

I'm slightly worried about this as well. If things like PhysX (the idea, not actually PhysX) really catch on, it might not matter, since compute performance will be helpful then.

I realize the chances of this are almost non-existent, but I wonder if AMD could build a compute/VLIW hybrid. I wonder if that idea is as silly as it sounds

Nemesis 1 · Jun 17, 2011

-Slacker- said:
You talk about missing posts, but all I see is a quadruple post.

Anyway, what does all this out of order business mean? Is the 7000 series going to be more like an x86 cpu, in which case, wouldn't that be bad? Don't modern gpus use risc architecture?

Ya got to get that post count up. NO The first post on the page was missing. I checked time stamp thats inorder. But I have in the past seen a running time stamp at this forum.

Nemesis 1 · Jun 17, 2011

podspi said:
I'm slightly worried about this as well. If things like PhysX (the idea, not actually PhysX) really catch on, it might not matter, since compute performance will be helpful then.

I realize the chances of this are almost non-existent, but I wonder if AMD could build a compute/VLIW hybrid. I wonder if that idea is as silly as it sounds

Well if You had something like an elbrus compiler that has been in the works sense 1982 I believe its very likely. AMD and microsoft teamed up on AMD 64 in the past. SO MS is now working on AMP . This time I hope it blows up in their faces.

Joseph F · Jun 17, 2011

podspi said:
I wonder if they are going to start releasing compute-only cards with this arch, perhaps for Cray? :awe:

http://en.wikipedia.org/wiki/AMD_FireStream

The FireStream variants will probably be released a few months after the Radeon/FirePro cards are released. Now if only there was a video editing suite that could be FS-driven... :awe:

Dark Shroud · Jun 18, 2011

Joseph F said:
Now if only there was a video editing suite that could be FS-driven... :awe:

Adobe will be updating soon so both companies cards can be used. They've said the only reason they went Cuda instead of OpenCL at the start was OpenCL wasn't done yet.

Cerb · Jun 18, 2011

Rezist said:
I agree with this post, I guess if the graphics work load changes this will be good change but currently it seems amd enjoys the better performance/watt thanks to the VLIW architecture.

AMD enjoys more number crunching ability per Watt thanks to VLIW. VLIW is not a panacea, and there's no reason to think that this will not be categorically better. It's all about making compromises. VLIW let them pack more functional units than other approaches, but VLIW is often hard to keep fed, and often-times impossible to reconcile with data dependencies, even with a perfect compiler, and almost always needs more Icache than scalar. nVidia made gaming a 2nd priority by the GT200, and went massively multicore, multithread, and with very robust memory management, most of which does practically nothing but waste space and power, for most users.

Nemesis 1 said:
Well if You had something like an elbrus compiler that has been in the works sense 1982 I believe its very likely. AMD and microsoft teamed up on AMD 64 in the past. SO MS is now working on AMP . This time I hope it blows up in their faces.

I hope it gets widespread use, and non-Windows implementations are allowed to exist and flourish (the details about that seemed nonexistent, but I hope MS realizes that not keeping a monopoly on it is in their best interests). There's no reason Intel could not easily support AMP. It will only require software work, on their part, and I'd bet that their parallel C++ and OpenCL compiler work has them 90% of the way there, already.

Nemesis 1 said:
SO what makes you think AMD has't found out the performance of Nights ferry in HPC work. YOU do know Nights corner is coming on 22nm right its for HPC market and developers have it(nights ferry) . Looks to me like AMD is scrambling .

Nah. Intel is going with cache coherent x86, big caches, small thread counts, and traditional notions of threads and contexts. They will be competing in overlapping markets, but with little overlap in potential customers, which will be good for both companies.

Arkadrel · Jun 18, 2011

Ryan Smith said:
It took a bit longer than expected to hammer out, but it's done now. We've glossed over some highly technical details (cache organization, certain latencies, etc), but hopefully this gives you guys a good idea of what GCN is capable of as a compute architecture. For graphics you'll have to wait a bit longer until AMD is ready to launch the first product.

http://www.anandtech.com/show/4455/amds-graphics-core-next-preview-amd-architects-for-compute

Now that is a intresting read, thank you for that.
Also like the illustrations.

instead of VLIW slots going unused due to dependencies, independent SIMDs can be given entirely different wavefronts to work on.

Bound to give big improvements to peformance.
That has to benefit not only compute but graphics too right? unused resources suck reguardless of what you useing your card for.

CU: the Scalar ALU:
The Scalar unit serves to further keep inefficient operations out of the SIMDs, leaving the vector ALUs on the SIMDs to execute instructions en mass

what does a scalar unit do? First and foremost it executes “one-off” mathematical operations. Whole groups of pixels/values go through the vector units together, but independent operations go to the scalar unit as to not waste valuable SIMD time.

Besides avoiding feeding SIMDs non-vectorized datasets, this will also improve the latency for control flow operations...

Again sounds like it ll give big improvements to performance.

..GCN will once more pair its L2 cache with its memory controllers.

...and will be fully coherent so that all CUs will see the same data, saving expensive trips to VRAM for synchronization...

Sounds good, I guess this means cayman was doing "expensive" trps to VRAM for synchronization? So performance found for this as well.

Cayman’s dual graphics engines have been replaced with multiple primitive pipelines, which will serve the same general purpose of geometry and fixed-function processing. Primative pipelines will be responsible for tessellation, geometry, and high-order surface processing among other things. Whereas Cayman was limited to 2 such units, GCN will be fully scalable, so AMD will be able to handle incredibly large amounts of geometry if necessary.

BOOOOOM!.... AMD tessellation performance will skyrocket with new card.
Which is one of those area's where theyve been abit weaker than Nvidia
(atleast in synthetical benchmark programs).

Tuna-Fish · Jun 18, 2011

-Slacker- said:
Anyway, what does all this out of order business mean? Is the 7000 series going to be more like an x86 cpu, in which case, wouldn't that be bad? Don't modern gpus use risc architecture?

Well, in the sense that everything designed after risc is risc.

Current AMD GPUs are VLIW, like Itanium, while nVidia uses a SIMD/MIMD system, much like this new one AMD just revealed.

And OoO is IMHO not the correct word for what they do here. Ultrasuperhyperthreading?

Arkadrel said:
Bound to give big improvements to peformance.
That has to benefit not only compute but graphics too right? unused resources suck reguardless of what you useing your card for.

Not really. A single-precision FMA unit is ~20 000 transistors, and the chips they build have over a billion transistors. They could add a couple of thousand extra of them and it would be hardly noticeable in chip size and power.

What costs is getting the data where it needs to be, when it needs to be there. VLIW was used because it's relatively cheap to implement regards to shuffling data around (4 individual register files, as opposed to one large one, and you only move data between them trough the eu's), and while it only helps in limited situations, graphics was one of them.

Now they have to shuffle their warps between the 4 execution units of the CU -- I guarantee you that in the domain of graphics, they are wasting more power and die area on the new design than the old one. It's just that now it's useful for a lot more things. (And also good for more exotic graphics routines -- I believe John Carmack has been itching to be able to program GPU's more directly to implement some cooler rendering algorithms.)

Again sounds like it ll give big improvements to performance.

Again, mostly for complex tasks. There isn't that much pixel-independent operations or control flow in graphics tasks.

Sounds good, I guess this means cayman was doing "expensive" trps to VRAM for synchronization? So performance found for this as well.

Synchronization was expensive on cayman. Guess how often most graphics engines synchronize anything? A handful of times a frame.

BOOOOOM!.... AMD tessellation performance will skyrocket with new card.
Which is one of those area's where theyve been abit weaker than Nvidia
(atleast in synthetical benchmark programs).

Which is a bit sad. Because in reality, there was no need for this. They were perfectly capable of delivering more than one triangle (for overdraw) per pixel on screen -- any program which wanted more tesselation capability than AMD could provide, was simply wasting it as opposed to using it. Marketing wins yet again.

Basically, all the real improvements I can see are for compute. Ignoring the rops, and unless there is anything they are keeping for themselves, this should do worse for power/area when compared to cayman on the same process node. But it should bring computing parity with NV, and given how much better AMD consistently seems to be on the silicon design side (as opposed to the chip architecture), it's entirely feasible that their chips will be the stream compute platform for tomorrow.

You might have noticed a little caveat up there in my gloominess for graphics -- the rops are a big deal, and they revealed almost nothing about them. For graphics, if Cayman had twice the rops it would probably match up evenly with GTX580. But AMD couldn't add any more of them, because they are tightly integrated with the memory controllers (4 per controller, and that's it), and they didn't want to make the memory bus any wider because it much increases fixed per-card costs. Perhaps now they either have a more flexible rop architecture, or just double the rops per pipeline? GDDR5 certainly has the bandwidth for it.

Nemesis 1 said:
Sounds like Haswell to me.

No, it doesn't. At all. If it did, Haswell would be utterly awful. First and foremost, it needs to deliver single-thread performance, and Intel knows that.

Also, the point about "before larrabee" wasn't who started first, it was that this is not a response to anything that Intel has done, because work on this had to start before anything about larrabee was publicly talked about. The basic design for this was likely done more or less immediately after AMD and ATi merged. Designs converge, not because people copy each other, but because given a task, there usually is the best way to do it.

Cerb · Jun 18, 2011

Arkadrel said:
(VLIW+SIMD->scalar+SIMD)
That has to benefit not only compute but graphics too right? unused resources suck reguardless of what you useing your card for.

Yes. The real key is that compiling code that's (a) low-ILP, and/or (b) has data dependencies that make high IPC difficult, even with high enough theoretical ILP, is a royal pain. I would not expect much, by way of DX9 improvements, nor to DX10 games that are DX9 with some DX10 updates.

The advantage of VLIW is that every FU is told exactly what to do during this cycle, so you don't need all of that register management and instruction issuing crap clogging up the execution units. If you can keep very high IPC, VLIW can even be smaller in program size than scalar, for a given level of performance. They could fit more arithmetic potential in a smaller space, this way. However, as the code does worse and worse at utilizing it, over time, is that really the most efficient use of space (including IO)? At some point, the answer will be no. nVidia decided that as of the FX, while ATI has patiently kept on with it.

The bet is that with being able to shove any incoming language into their backend, and get well-optimized code out, they will end up with better real performance than VLIW, for a given space/Wattage, with newer software, and that will make up for the reduced density, and that the combination of that with being able to put more on the chip with a smaller node will make up for the cases where their VLIW design was ideal. Historically, it's a pretty safe bet.

AMD has been using LLVM for some time now (I suspect that with this, they will be integrating pieces of LLVM through more their drivers and SDK), and I'd be willing to bet that the final decisions for the new design were heavily influenced by what their compiler/driver/SDK tools could do, rather than just letting the hardware guys go at it. If they can make a compute-friendly GPU that is easier to optimize for than nVidia's, and/or that won't really need it for general-purpose compute apps (IE, doing well enough JIT), they will have a winner on their hands, even if it has lower peak performance (as a side effect, it may be easier to do driver performance improvements for games, as well).

BOOOOOM!.... AMD tessellation performance will skyrocket with new card.
Which is one of those area's where theyve been abit weaker than Nvidia
(atleast in synthetical benchmark programs).

Improvements are good, but I think way too much emphasis was put on tessellation performance in benchmarks that don't resemble games. Both AMD and nVidia have, and will have, good enough tessellation performance.

P.S. I wonder if, for high-level work, they have any compiler magic going on for the SIMD? IE, 4xVLIW ABCD -> SIMD Ax4, Bx4, Cx4, Dx4, mapping wide scalar onto SIMD, leaving the scalar unit only for ops that can't be mapped as such? Since, previously, the VLIW units executed many of the same instruction at the same time, in a SIMD-like-but-not-really-SIMD fashion (SIMD of VLIWs, IIRC), such a mapping could work for high-level code, like clean Direct3D (including HLSL) and OpenGL (including GLSL). It would still be less efficient, but likely not too much so.

wahdangun · Jun 18, 2011

Nemesis 1 said:
I not going to debate you on this . But your wrong about who was planning what and when . Larrabee was choosen over 2 other designs back in 2005 . But the research started befor 05. I can get the link but its useless to link anything around here.

yeah, and Larrabee still failed.

btw its really interesting if AMD goes OoO because it will surely boost their GPU performance tremendously but it will also increase their power consumption. and its shame tough if AMD really ditching their VLIW architecture. its mean future card will have crap performance on bitcoin

grimpr · Jun 18, 2011

Another good overview at pcper.

http://www.pcper.com/reviews/Graphi...ecture-Overview-Southern-Isle-GPUs-and-Beyond

Mopetar · Jun 18, 2011

Even if the theoretical maximum graphics performance takes a dip with this new architecture, it's not a big deal. For the vast majority of games you need to run a three monitor set-up in order to tax the really high-end cards.

It's disingenuous to assume that things will remain that way, but there aren't too many games on the horizon that will tax a 580 or 6990 at 1080p. By the time we get to that point, the next generation of cards will likely be able to handle them quite well.

AMD gave details on its next GPU architecture at AFDS, goes Scalar and Out of Order.

Lifer

Lifer

Lifer

Lifer

Golden Member

The New Boss

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Diamond Member

Senior member

Lifer

Golden Member

Lifer

Lifer

Diamond Member

Golden Member

Elite Member

Diamond Member

Golden Member

Elite Member

Golden Member

Golden Member

Diamond Member