Ashes of the Singularity User Benchmarks Thread

Page 40 - Seeking answers? Join the AnandTech community: where nearly half-a-million members share solutions and discuss the latest tech.

Shivansps

Diamond Member
Sep 11, 2013
3,855
1,518
136
It's possible with multi-adapter and async compute hardware supported iGPU, in theory they can do almost anything with that feature of DX12. They can break the screen up into quadrants and send 1 of them into the iGPU, or they can send compute only.

If the iGPU is powerful enough, the extra latency should be minimal.

This feature is one of the reasons I am excited about future gaming on DX12. I hate the thought of my iGPU doing jack all while gaming so if it gets put to use, thats great.

Im not sure if its even needed for the iGPU to support AC, but if someone has a, lets say, GTX980 that does not support AC, but it has a DX12 iGPU Haswell/Skylake/GCN, a simplier solution i can think off is send the compute-only to the igp, that whould be far easier to implement than doing rendering as well on it, but it can also do it if it support AC in that case.

Anyway it could be a way out for Nvidia owners.
 

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
Im not sure if its even needed for the iGPU to support AC, but if someone has a, lets say, GTX980 that does not support AC, but it has a DX12 iGPU Haswell/Skylake/GCN, a simplier solution i can think off is send the compute-only to the igp, that whould be far easier to implement than doing rendering as well on it, but it can also do it if it support AC in that case.

Anyway it could be a way out for Nvidia owners.

doesnt make any sense because that would mean the software would have to cross the pcie bus which would add more latency than the slow context switches.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
That's all you been doing wrt to AC for the last week, isn't it?

What I have been doing is trying to get to the truth. Despite what the Oxide dev stated, I had a hard time believing that Maxwell 2 lacked the capability to do asynchronous compute.

Looks like my skepticism may have paid off..

Ya, so more evidence that you think GPUs should focus on maximizing the short-term performance, not worry about any next generation games, any next generation VRAM requirements. Again then why do you care so much about how Maxwell will perform in DX12 or its ACE functionality? It contradicts your statements that you don't think GPUs should be forward-thinking in their design.

There's nothing wrong with being forward thinking. When it becomes a problem, is if you develop features which are intended solely for next gen use four or five years down the road, but those features take up die space and are essentially inert.

That's too big of a time gap for those features to be useless if you ask me.

This has been covered years ago -- AMD cannot afford to spend billions of dollars to redesign brand new GPU architectures like NV can given AMD's financial position and also the fact that their R&D has to finance CPUs and APUs. NV can literally funnel 90%+ of their R&D into graphics ONLY.

And I understand that, but it still doesn't change the fact that this played a major part in AMD losing a lot of market share to NVidia. Although many people buy GPUs with the intent to keep them for 3 years or more, it DOES NOT MATTER if NVidia keeps beating AMD in the initial phase, which is the most essential phase because that's what affects public perception the most.

After Kepler was released, nobody but the industry insiders could have known that it's performance would diminish so rapidly because of its meager compute capabilities, whilst GCN's would rise as developers started to use compute shaders more and more in their engines..

But by then it was too late anyway.

Therefore, AMD needed to design a GPU architecture that was flexible and forward looking when they were replacing VLIW. That's why GCN was designed from the start to be that way. When HD7970 launched, all of that was covered in great detail. Back then I still remember you had GTX580 SLI and you upgraded to GTX770 4GB SLI. In the same period, HD7970 CF destroyed 580s and kept up with 770s but NV had to spend a lot of $ on Kepler. Then NV moved to Maxwell and you got 970 SLI and then 980SLI but AMD simply enlarged HD7970 with key changes into R9 290X.

I already addressed this above. As long as NVidia is able to dominate AMD in the initial phase of any new product release, then AMD has no chance. It's the initial phase which creates the lasting impression.

By the way, are you keeping tabs on my hardware changes? :sneaky:

But that's why I keep asking, why do you in particular care about DX12 and AC? It's not as if you'll buy an AMD GPU and it's not as if you won't upgrade to 8GB+ HBM2 Pascal cards when they are out. Therefore, for you specifically, I am not seeing how it even matters and yet you seem to have a lot of interest in defending Maxwell's AC implementation, much like to this day you defend Fermi's and Kepler's poor DirectCompute performance. That's why it somewhat comes off like PR damage control for NV or something along those lines. Since you will have upgraded your 980s to Pascal anyway, who cares if 980 hypothetically loses to a 390X/Fury in DX12? Doesn't matter to you.

Well I'm partial to NVidia, but I also care about truth. NVidia has gotten an unfair shake on the internet lately because of irresponsible remarks by a certain developer, and a certain ex ATi employee with an agenda.

But I don't know what you're talking about when you say I defend Fermi's and Kepler's poor DirectCompute performance. Fermi had very good compute performance, Kepler's was probably below average for GK104, and above average for GK110.

But by the time compute shaders really became an industry trend, NVidia had released Maxwell which has very strong compute performance. So once again, NVidia was on time when it comes to anticipating industry trends.

Even when AMD had HD4000-7000 series and had massive leads in nearly every metric vs. NV, AMD's GPU division was hardly gaining market share, and in rare cases where they did market share (HD5850/5870 6 months period), it was a loss leader strategy long term with low prices and frankly by the end of the Fermi generation NV gained market share. In other words, NONE of AMD's previous price/performance strategies worked to make $. Having 50-60% market share and making $0 or losing $ is akin to having 50-60% of "empty market share." In business terms, that's basically worthless market share. It's like Android having almost 90% market share worldwide by Apple makes 90% of the profits.

Yep, AMD has historically been subjected to bad management year after year.

1) AMD implemented an optimization in the drivers to vary the tessellation factor since the performance hit was much greater on AMD's hardware that cannot handle excessive tessellation factors;

Well this might have been enabled in the HardOCP review then if it's on by default in the drivers and inflated the frame rates for AMD.

2) Actual user experience. I trust that far more than any review HardOCP does.

I've heard people accuse HardOCP of being biased towards NVidia or AMD, usually depending on the results of a particular review :sneaky:

Good thing there are objective professional sites we can rely upon to tell us the truth:

Techspot tested the Witcher 3 shortly after release. The HardOCP review that I posted was done months after release, with many more optimizations for GW so the two aren't really comparable.

TressFX seems far more efficient than HairWorks as well (or alternatively it doesn't use worthless tessellation factors to kill performance).

TressFX is terrible compared to hairworks. In Tomb Raider, Lara looks like she's underwater or in outer space as her hair seems to defy gravity, and she's the only entity which uses it.

In the Witcher 3 on the other hand, there are multiple entities with HW enabled which look far superior to when it is disabled. Geralt himself is probably the worst example of HW in the game if you ask me.

But other creatures like wolves, horses, monsters etcetera look way better with HW enabled.
 
Last edited:

Shivansps

Diamond Member
Sep 11, 2013
3,855
1,518
136
doesnt make any sense because that would mean the software would have to cross the pcie bus which would add more latency than the slow context switches.

But software always have to cross the pci-e bus one way or the other, in witch cases you this as behing a problem.

Take in considerations the improvements DX12 made to multi-gpus, like the single memory pool.
 
Last edited:

monstercameron

Diamond Member
Feb 12, 2013
3,818
1
0
There's nothing wrong with being forward thinking. When it becomes a problem, is if you develop features which are intended solely for next gen use four or five years down the road, but those features take up die space and are essentially inert.

That's too big of a time gap for those features to be useless if you ask me.
if there is no hardware to research new techniques on, how will devs ever move forward outside of the limited innovations from the oems?

But software always have to cross the pci-e bus one way or the other, in witch cases you this as behing a problem.

Take in considerations the improvements DX12 made to multi-gpus, like the single memory pool.

not as simple as that.
 
Last edited:

Vesku

Diamond Member
Aug 25, 2005
3,743
28
86
Well I'm partial to NVidia, but I also care about truth. NVidia has gotten an unfair shake on the internet lately because of irresponsible remarks by a certain developer, and a certain ex ATi employee with an agenda.

This whole Async Compute snowball started rolling when Nvidia PR said Oxides AotS benchmark was buggy. So far seems that the issues are on Nvidia's side of the gaming equation.
 
Last edited:

Tapoer

Member
May 10, 2015
64
3
36
Mahigan at overclock.net keep ignoring me when i mention the posibility of using DX12 Asymmetric Multi-GPU capability for something as simple as sending the compute tasks to a secondary DX12 device, i find that very very suspicious.

You should see AC as if it was SMT on the CPU.

The objective is to use all the resources of a core, since one thread per core is not enough to max it and there is a limit for what a OoO engine can do, a second thread is going to help it and gain ~20% extra performance.

What you are asking is instead of using the logic core on core #1 why not using the full core #2 (other GPU)?
Yes they could, but they would left performance on core #1, and they can feed core#2 with other tasks too.

Epic did that when they were showing the multi-vendor features of D3D12, they used the iGPU to perform some post-processing and gained more performance.

But then again without AC it would be almost mission impossible to use most of the GPU power.
Because even on consoles with all the documentation and fixed hardware they still get higher performance with AC. Now imagine on the PC if any developer would waste an insane amount of time optimizing their shaders code and compute for only one GPU type, they would get as much boost or more with AC and on a much shorter time.

The learning process to use AC should be similar to use multi-vendor GPU in D3D12 and Vulcan.
 
Last edited:

antihelten

Golden Member
Feb 2, 2012
1,764
274
126
Not true. With larger caches, VRAM, registers etcetera, CPUs and GPUs can hold more data, which reduces the latency penalty dramatically.

Give you an example. VRAM stores only graphics related data, like textures, shaders, meshes etcetera.. During boot up, the game loads some of its data into VRAM from system memory and storage. And when you load a saved game or start a new game, more data is again loaded into VRAM during the loading screen and shaders are being compiled.

So basically, a well programmed game or any software is already loading it's working set into memory for fast access, rather than pulling it from storage when it needs it.

And in gameplay, there is a lot of shuffling of data between VRAM and system RAM and CPU and GPU. Having access to larger memory pools reduces this shuffling, and thus minimizing access latency which can cause stuttering or frame drops.

Yet with all of this shuffling of data, we are still able to get very high performance in PC games compared to consoles like the PS4 which have much lower theoretical latency due to being integrated on the same die and having HUMA..

Even low end gaming PCs are outperforming the consoles despite being handicapped by an API with much higher overhead and using NUMA architecture..

Yes true. Caches, memory and registers will not reduce the size of a latency penalty from communication between the CPU and the GPU, they can however be used to reduce how often you experience the latency (which I mentioned in the previous post, but you apparently ignored that).

And your example is fairly useless, since loading up textures and meshes is not at all like communicating scheduling. We're not dealing with shuffling around working sets here, we're dealing with scheduling data. Two types of data that have very different timing dependencies.

You cannot load up your scheduling ahead of time, unless you have somehow come up with a way to make you rendering engine psychic (which is not entirely impossible actually, but it does require that your pipeline is extremely regular, which is rarely the case).
 
Last edited:

PPB

Golden Member
Jul 5, 2013
1,118
168
106
He replies to everything and ignored me 5 times.

And it does make sence.

It doesnt make sense because your IGP in the first place should have AC capabilities if it is already DX12 prepared for using multi-adapter. Then again why would yo offload compute to the iGP which also has AC when you can actually do multi-adapter GPU-iGP AND AC at the same time? dGPU for rasterization and iGP for compute is just applying half of your benefits with DX12 when you in fact could do what I just said for even more performance (and considering games will start to leverage even more compute capabilities in GPUs down the road).

Seems more of an attempt to make Nvidia save face with this fiasco rather than a thoughtful approach. And in the end, even if possible and effective, it will still be an inferior approach that doing Multi adapter + AC (or even AC only, for that matter). Nvidia needs to have AC working, shenanigans wont do this time.
 

vbored

Junior Member
Sep 7, 2015
12
2
41
not as simple as that.

Using the igpu along side a dgpu was something I found interesting about dx12 like Silverforce. Would be nice if some of the physics api's could start off loading to the igpu, could see havok doing it not so much physx.

I agree it's not simple but it also looks like we could see some of this stuff down the line.

Oxide dev showing it being done with amd gpu/apu but says it would work with other configs.
https://www.youtube.com/watch?v=9cvmDjVYSNk

This article has a vid of an nvidia gpu and an intel igpu being used together
http://blogs.msdn.com/b/directx/arc...rmant-silicon-and-making-it-work-for-you.aspx

Hopefully in the not to distant future the igpu wont just be a waste of space in my cpu package.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
This thread has been an interesting roller coaster. From Maxwell2 DOESNT have AC to Nvidia hasn't activated in the driver.

As much as I don't like a particular NV-Pro poster, he called it. Kudos to him for sticking to his guns.

Interested to see what rabbit NV pulls out before this game hit shelves. I can only imagine the signatures this thread will produce.
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
This is what is being circulated as the problem/fix with async compute on nVidia vs. AMD
SOURCE

“The Asynchronous Warp Schedulers are in the hardware. Each SMM (which is a shader engine in GCN terms) holds four AWSs. Unlike GCN, the scheduling aspect is handled in software for Maxwell 2. In the driver there’s a Grid Management Queue which holds pending tasks and assigns the pending tasks to another piece of software which is the work distributor. The work distributor then assigns the tasks to available Asynchronous Warp Schedulers. It’s quite a few different “parts” working together. A software and a hardware component if you will.

With GCN the developer sends work to a particular queue (Graphic/Compute/Copy) and the driver just sends it to the Asynchronous Compute Engine (for Async compute) or Graphic Command Processor (Graphic tasks but can also handle compute), DMA Engines (Copy). The queues, for pending Async work, are held within the ACEs (8 deep each)… and ACEs handle assigning Async tasks to available compute units.

Simplified…

Maxwell 2: Queues in Software, work distributor in software (context switching), Asynchronous Warps in hardware, DMA Engines in hardware, CUDA cores in hardware.
GCN: Queues/Work distributor/Asynchronous Compute engines (ACEs/Graphic Command Processor) in hardware, Copy (DMA Engines) in hardware, CUs in hardware.”

So "supports async compute" seems to not be so straight forward with nVidia as many of the tasks are being done by the CPU. Tasks that should be done by the GPU but Maxwell 2 (or Maxwell, Kepler, Fermi) can't do it.

I'm curious to see how effective this is. The whole purpose of async compute is to more efficiently use resources on the GPU. Is this a truly efficient process or will having the CPU involved cause slow downs? Are we freeing CPU resources with DX12 only to bog it back down with additional task that are supposed to be run by the GPU? Does it mean that the Dev has to write a separate path for nVidia? Does it mean we can have the pathway for a particular vendor left out so you can't use async compute unless you have a particular brand card? Or can nVidia's drivers just take the information they need out of the standard DX12 pipeline to execute it in software on the CPU?

At the very least it seems counter productive from a "low level API" perspective. At worse it can end up being wielded like a proprietary software feature. Which would just stifle progress as it does in DX11.
 

Glo.

Diamond Member
Apr 25, 2015
5,711
4,559
136
That explains why at 4K resolution we see a difference in performance between 4 and 6 core CPU with Nvidia hardware. It might be a big problem, because it adds layer of abstraction for Nvidia hardware, to know how to handle the commands in AC. GCN handles it on hardware level.

Anyways, we still need more data to get confirmation on this. But so far, everything points to that without the software Nvidia hardware will not be able to get AC properly. It looks like, because the API talks directly to the GPU, specific code has to be added to the game engine, for Nvidia hardware.
 

Sabrewings

Golden Member
Jun 27, 2015
1,942
35
51
It looks like, because the API talks directly to the GPU, specific code has to be added to the game engine, for Nvidia hardware.

No API on PCs talk directly to the GPU. That's not how it works. The driver is always in the middle providing translation. The point of a lower level API is that it puts the API closer to the GPU, but the driver is still required to tell the system how to talk to that particular architecture.

I've been pretty quiet on this thus far as a Maxwell owner. It is a bit of a concern, though I never planned to have Maxwell longer than until Pascal launches. If Nvidia can accomplish it in software without a performance loss, I really don't care.

Are we freeing CPU resources with DX12 only to bog it back down with additional task that are supposed to be run by the GPU?

And what documentation says that scheduling of tasks is "supposed" to be done by the GPU?
 
Last edited:

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
That explains why at 4K resolution we see a difference in performance between 4 and 6 core CPU with Nvidia hardware. It might be a big problem, because it adds layer of abstraction for Nvidia hardware, to know how to handle the commands in AC. GCN handles it on hardware level.

Anyways, we still need more data to get confirmation on this. But so far, everything points to that without the software Nvidia hardware will not be able to get AC properly. It looks like, because the API talks directly to the GPU, specific code has to be added to the game engine, for Nvidia hardware.

I hope not. I hope they can make it work through the standard DX12 pipeline. If there has to be specific code there will be shenanigans. Guaranteed!
 

3DVagabond

Lifer
Aug 10, 2009
11,951
204
106
And what documentation says that scheduling of tasks is "supposed" to be done by the GPU?

Well, let's see... When nVidia tried claiming Oxide was writing buggy code Oxide said they send the source to AMD, Intel, Msft, and nVidia. They write their code adhering to the DX12 protocols set by Msft. When done that way the scheduling of the tasks are done by the GPU.
 

Sabrewings

Golden Member
Jun 27, 2015
1,942
35
51
Well, let's see... When nVidia tried claiming Oxide was writing buggy code Oxide said they send the source to AMD, Intel, Msft, and nVidia. They write their code adhering to the DX12 protocols set by Msft. When done that way the scheduling of the tasks are done by the GPU.

So, you're making a loose supposition? Show the documentation.
 

TheELF

Diamond Member
Dec 22, 2012
3,973
731
126
This is what is being circulated as the problem/fix with async compute on nVidia vs. AMD
SOURCE



So "supports async compute" seems to not be so straight forward with nVidia as many of the tasks are being done by the CPU. Tasks that should be done by the GPU but Maxwell 2 (or Maxwell, Kepler, Fermi) can't do it.
...
snip
...
At the very least it seems counter productive from a "low level API" perspective. At worse it can end up being wielded like a proprietary software feature. Which would just stifle progress as it does in DX11.

the scheduling aspect is handled in software for Maxwell 2
With GCN the developer sends work to a particular queue
Not the same thing?
The devs have to program into the game WHICH IS RUNNING ON THE CPU the scheduler and whatever other software part , THAT is what low level is,async and ACEs don't get commands from the curvature of space, they have to come from software so again it comes down to what oxide has done,if oxide is worse at programming for nvidia then they are in programming for amd then things are clear.
 

railven

Diamond Member
Mar 25, 2010
6,604
561
126
I find unfounded statements to improve a particular position humorous.

Some people want an AMD win so bad they'll take the words from an AMD sponsored Dev and EX-ATI employee to heart.

This game better be amazing for all the turmoil its causing. Haha.
 

Zstream

Diamond Member
Oct 24, 2005
3,396
277
136
Some people want an AMD win so bad they'll take the words from an AMD sponsored Dev and EX-ATI employee to heart.



This game better be amazing for all the turmoil its causing. Haha.


Some may feel that way, but I assure you, those waiting in the wing are those in my position. I'm running an Intel 2400 and a 270.

Those constantly upgrading probably won't care. Just remember, the market percentage of those willing to shell out cash every year is dwindling and will be for quite some time.

I was going to upgrade to a 970 but decided to wait. I'm glad I waited as the 3.5gb issue showed itself. Now, I almost recently upgraded to a 290, but decided against it as dx12 is coming and I should just wait to see some preliminary results.

In conclusion, not everyone is a partisan hack. Just remember that.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
if there is no hardware to research new techniques on, how will devs ever move forward outside of the limited innovations from the oems?

The devs will use whatever is given to them, by the IHVs and the ISVs like Microsoft and Khronos Group. The devs innovate by themselves by coming up with new and more efficient algorithms for visual effects.

As an example, Lionhead studios came up with a new solution for dynamic global illumination that will be used in Fable Legends, and was actually integrated into the Unreal Engine 4.

This whole Async Compute snowball started rolling when Nvidia PR said Oxides AotS benchmark was buggy. So far seems that the issues are on Nvidia's side of the gaming equation.

It was probably was buggy, seeing as it was in alpha. But yes you're right, that the main issue with NVidia's performance had to do with NVidia themselves.
 

Carfax83

Diamond Member
Nov 1, 2010
6,841
1,536
136
You cannot load up your scheduling ahead of time, unless you have somehow come up with a way to make you rendering engine psychic (which is not entirely impossible actually, but it does require that your pipeline is extremely regular, which is rarely the case).

Are you serious? o_O

Software does this ALL OF THE TIME. Why do you think CPUs have branch prediction units? Rendering is an ordered process, so scheduling ahead of time is absolutely feasible and does not require a rendering engine to be psychic.

Games have been doing this for years. If a game is linear, which most games are, then scheduling ahead of time is easy. Even with the latest open world games like the Witcher 3, it can still be done effectively and in such a way, that NO LOADING screens are necessary..

You completely underestimate the ingenuity of developers in overcoming obstacles..
 

ocre

Golden Member
Dec 26, 2008
1,594
7
81
This is what is being circulated as the problem/fix with async compute on nVidia vs. AMD
SOURCE



So "supports async compute" seems to not be so straight forward with nVidia as many of the tasks are being done by the CPU. Tasks that should be done by the GPU but Maxwell 2 (or Maxwell, Kepler, Fermi) can't do it.

I'm curious to see how effective this is. The whole purpose of async compute is to more efficiently use resources on the GPU. Is this a truly efficient process or will having the CPU involved cause slow downs? Are we freeing CPU resources with DX12 only to bog it back down with additional task that are supposed to be run by the GPU? Does it mean that the Dev has to write a separate path for nVidia? Does it mean we can have the pathway for a particular vendor left out so you can't use async compute unless you have a particular brand card? Or can nVidia's drivers just take the information they need out of the standard DX12 pipeline to execute it in software on the CPU?

At the very least it seems counter productive from a "low level API" perspective. At worse it can end up being wielded like a proprietary software feature. Which would just stifle progress as it does in DX11.

So lets be real about this. Please, guys.

Async compute is something AMD was promoting for GCN, we had the anandtech article several months ago that introduced us to their ACE and spoke about how it could really be useful. This was something AMD talked up, it was something they were proud of. And all of what they said about it, may very well be true. This was something that is supposed to be great for GCN, technology that helps their architecture do more and perform better.

Anand article talked up ACE but they also went out to contact nvidia on the matter. Their article went on to include nvidia can do the 1 graphic + 32 compute- we since have heard so many forum warriors giving their "expert" analysis of this, some going as far as denouncing and calling anandtech wrong demanding they change their article.....all this, with so little real knowledge and information.

We must remember, it was AMD that was hyping this ability up. And it really may be very useful for their architecture. If you think about it, the more gaps and resources sitting there with nothing to do, the more theoretical performance left for Async compute to exploit.

Perhaps the ACE was designed for a particular problem AMD set out to address. Perhaps Nvidia had a very different approach to keeping their cores busy. See, nvidia has been competing very well with some seemingly anemic paper specs for a good while now. Their cards have less resources and have been holding up to cards with much more theoretical gflops. The only way this is possible, nvidia has been extracting more real world performance out of their designs. They are obviously utilizing their HW differently and it has been working.

Whether some of their magic has been their software scheduler, its is up for debate. But remember, it was AMD talking up their ACEs and async compute. To be real about it, this is something they went out of their way to promote. It may be a way to extract more from GCN. That is great!!

But we dont know what kind of edge this will give them going down the road. Nvidia looks to have taken a very different approach but you have to admit, they have been extracting a lot of performance out of their designs for a long time now. I cant set here and say that their approach is inferior, it seems to have paid off very very well for nvidia. I cant even say that AMDs ACE is the better route or one that Nvidia should have taken....see, just because it might do wonders for GCN.....lets say it is even great and will be awesome for GCN, still doesnt mean that it would have ever been feasible or as beneficial for maxwell.

These are very different designs and different approaches. Trade offs and compromises are the reality. What I think matters the most, it is the outcome. At the end of the day, it is the performance.

We can say and claim whatever we want but no one knows how asynchronous compute will change the game for AMD....or even if it will at all. As interesting as all this has been, we dont know what edge this might give. What we do know is that nvidia seems to have taken a different approach, that up till this point has served them very well. It could just as well serve them down the road to.

See, looking at the Ashes benchmark, nvidia was able to get very amassing performance out of DX11. I tend to think they will be able to get more performance out of DX12 as well. They may have to do things differently but if you look at starswarm then you can see that their route has allowed them to do some pretty amassing things.

If I learned anything from all this, it is that GCN and Maxwell are different designs and different approaches. There could be an advantage to the route AMD took, one that might even shine if it is exploited. But that doesnt automatically make the other approach without advantages of its own as well. Its just now we are getting this opportunity to see things on a different level......i guess such huge discussions involving SW in such early stages would naturally be on a different level.