Architectural Direction of GPUs

BenSkywalker · Apr 11, 2010

I made the request and was granted the option of creating this thread about architectural differences in the current GPUs which will be heavily moderated- I am very interested in discussion about the different approaches and how they are playing out and will play out and think we should be able to handle this without the loyalists comments that come in to play.

To get it out of the way up front from an overall architectural standpoint from everything I have been able to gather on a computational level the GF100 has a rather clear advantage over the 58xx parts- but it *should* considering its' significant size and power requirement differences. I'm stating this up front so people don't try and talk about a slanted angle, all that extra die space, power and heat really are there for a reason, it just isn't going to show up in current games.

With the disclaimers out of the way....

First up, fillrate. The 58xx parts have a massive advantage on this front and it makes itself clearly visible in current games when pushed to the highest resolutions. This is a fairly simple and straightforward observation but looking into it a bit more it is the first time that a company has released a new top end part that has considerably less texel fillrate then the previous generation. Even the 57xx parts are competitive to the GF100 here, obviously a very distinct divergence has taken place in terms of design philosophy. While we have known for a while that the general direction of games has been changing the ratio of raw fill/shader ops in the direction of shaders, this is the first time we have seen a company release a part that went to the extreme of actually reducing what was for a long time the defining raw metric used to define vid cards, texel fill. Old timers will likely recall the old 'fillrate is king' mantra, and while that obviously died off some time ago, the thought never even dawned on me that we would head backwards in that area.

To me that indicates a bet on the direction that games will be taking. Heavy useage of elements that chew up raw fill are going to choke, badly, on the GF100 based parts when compared to their 58xx counterparts. Nothing nVidia does with driver updates or anything else is going to fix that. They have a brick wall they are going to run into, there is no getting around it. *IF* games start to take on a direction where they are far more reliant on shader ops then raw fill tactics, this choice could look very smart in retrospect. If we stay our current course, it won't end up looking to great for nV. They need the current trend to have a sharp spike in it for it to pay any sort of appreciable dividends.

Up next tesselation. Clearly, nVidia dominates this area. ATi has had tesselation for a long time now, old timers will likely recall TruForm doing some interesting things with CounterStrike back in the days when it was running on Carmack's engine. I think that we had a bit of two elements ending up with the huge disparity in tesselation performance. One, ATi likely thought that with their extensive time and experience having a tesselation engine, they were going to have a superior offering to nV who was too tied up with compute logic. I would have made that same judgement myself. nV probably thought that ATi was going to try and push tesselation hard and wanted to go over the top and try to take their trump card out from under them. In the end, from a consumer point of view, I think the advantage here ends up in ATi's favor because nV's tesselation advantage is so strong that nV blew a lot of transistors on something developers won't be able to use or else they will make it unplayable on anything but a GF100 based part(we may see a few outliers like we do today with PhysX).

General shader hardware- in terms of raw throughput, the parts are quite close. In implementation everyone should be aware that nV devoted a lot more effort to this area, largely due to them making use of it for other purposes. For gaming uses the biggest advantage I see for nV is that they are more likely to handle *any* shader code at reasonable speed. Their entire layout seems very friendly for less then optimal code, this is another tradeoff. To compare it to CPUs a general way of looking at it is ATi is kind of like Cell, extremely potend with the right kind of code, but severe performance penalty for less then optimal useage. nV is more along the lines of x86, never going to hit the same peak throughput, but going to handle anything you throw at it with reasonable speed(those are very loose comparisons, just trying to give a general idea). This is likely going to play well for nV in terms of getting better performance on the newest titles out of the box, less driver optimizations in terms of compiler tweaks etc. Conversely, ATi should have the biggest improvements with driver updates for titles throughout the life of their current parts.

GPGPU- Obviously, utter domination by nVidia. I know most people here don't care, I do quite a bit of video transcoding and there are going to be a lot of CS5 users who find it to be a pretty big deal. A lot of peak throughput numbers get thrown around about how the parts are fairly close, in actual useage it looks like the 480 should be close to twice as fast as the 5870 in bad cases for the 480, somewhere in the 600%-1000% faster in situations better suited for it. A hefty amount of the extra transistors they are carrying is because of this, their layout and the way on chip communication is handled is *far* more CPU like then any GPU we have seen to date. The cache structure and the ability of the different segments of the chip to communicate with each other along with lots of other refinements truly have this on an entriely different level. I don't consider this a slight to ATi at all, they made a choice not to compete in this segment, and in doing so they ended up with a smaller, cooler, earlier chip. Be that as it may, anyone who wants to do anything GPGPU related really has only one viable choice at the moment in the high end segment.

On this same topic, some of the new functionality of the GPGPU elements of this chip actually could be used in games, hardware recursion is an interesting one as it makes ray tracing a realistic possibility in terms of added effects moving forward(not full ray tracing, but hybrid to generate very high quality reflections). These types of features are something that again won't show up in games anywhere close to the near future, but are very interesting from a design perspective as where they are leading us to.

This generation is the first major shift away from each other I have seen ATi and nV take in terms of architectural direction. Not talking about strictly die size, but the overall ratio of what they are spending transistors on. For nV to reduce their raw fill while pushing an exponential increase in geometric throughput as a general example, it is a very interesting design choice. Perhaps we are seeing nV do what ATi is going to do with their next major redesign, or perhaps this is the general direction both companies are headed in.

I've said it many times and I do firmly believe it, most games today are simply ports of console titles. On that particular front, I see the 58xx parts having a clear architectural advantage over nV. That isn't to say they will always be faster, but in terms of die space/performance they should obliterate nV. Moving forward, if we have games that start making use of heavy tesselation or, and I find this *highly* unlikely, we have games using ray tracing as a post process type of effect for reflections, it is likely that nV will destroy the 58xx parts, badly(although, this would almost certainly end up akin to PhysX, the performance rift would be so severe that people wouldn't compare the parts at all with the features on).

With ATi belonging to AMD, I'm not sure that ATi is going to be interested in heading down the same path that nV is at all. Not saying that they won't, but if they can avoid GPGPU ever taking off, it helps their CPU division while hurting one of their major competitors. Of course, they have the desireable position of also being able to adapt rather quickly and respond if consumer demand starts showing an increased interest level. I would imagine that they would like to at least increase their presence in the segment before Intel fixes the mess that is Larrabee and manages to get something workable out of the design, but that could be years from now.

Clearly looking long term, we are nearing the point of being completely fill 'complete', although nV backed off before that became a 'wall'. They also have placed considerable amounts of resources into elements that may or may not be utilized by the masses at all. Their part is clearly the more risky design, the hotter design, and the larger design. If ATi feels any pressure from nV on the performance front, looking at it from an overall design standpoint, they should be able to reduce an upclocked part that can best the 480 relatively speaking easily. Actually, in terms of overall gaming performance as it stands now, it seems that any part that nV could release on this build process ATi should be able to beat it if they are so inclined(a "simple" respin still costs millions, so keep in mind that is a business decission that does have drawbacks to a company trying to make money). If games start shipping that make use of all the extra transistors that nV has packed its' parts with, ATi simply doesn't stand a chance with their current offerings.

Serious discussion only. Folks who troll, bait, provoke, cr@p or otherwise disrupt this thread will be punted. Thanks in advance.
Anandtech Moderator - Keysplayr

Blue Shift · Apr 11, 2010

Very nice post. I can't help but think that the GF100 architecture, while very innovative, is out-of-place in today's game industry. What with all the console ports we're seeing, what are the chances that many games will require the tessalation and general-computing potential of the 400 series cards? TWIMTBP titles could be an exception to this rule, but spending money to help develop games seems counter-prodictive to a company that's, as you've mentioned, trying to make money. OpenCL applications (such as Dolphin, which I'd like my rig to run at faster speeds) could also be good news, since OpenCL runs on both companies' cards.

However, what about during the next console generation? Who will be making the GPU for the predecessor to the 360? If it's nVidia, then the GF100 architecture could get the last laugh... Even if it doesn't make a lot of sense for playing today's games on. Still, I doubt that the 480 would be capable of playing the next generation's games even if the nextBox's architecture is Fermi-based. From a gamer's perspective, the timing seems a bit off here.

Edit: The flip-side of this point is that ATI's architecture isn't a great fit for today's gaming industry either. Companies aren't optimizing games to run on ATI's parallel whatsis either, so ATI is being forced to make all these "driver optimizations." Not fun for the software team, and not fun at all for gamers if their stream of driver updates suddenly dries up.

EarthwormJim · Apr 11, 2010

How hard is tessellation to implement into already existing engines or games? Would it be feasible to augment console ports with high levels of tessellation to take advantage for GF100?

cbn · Apr 11, 2010

BenSkywalker said:
To get it out of the way up front from an overall architectural standpoint from everything I have been able to gather on a computational level the GF100 has a rather clear advantage over the 58xx parts- but it *should* considering its' significant size and power requirement differences. I'm stating this up front so people don't try and talk about a slanted angle, all that extra die space, power and heat really are there for a reason, it just isn't going to show up in current games.

Speaking of heat/power/die space, how much does "double precision" affect those totals?

Can "double precision" be made useful for any particular genre of game?

BenSkywalker said:
First up, fillrate. The 58xx parts have a massive advantage on this front and it makes itself clearly visible in current games when pushed to the highest resolutions. This is a fairly simple and straightforward observation but looking into it a bit more it is the first time that a company has released a new top end part that has considerably less texel fillrate then the previous generation. Even the 57xx parts are competitive to the GF100 here, obviously a very distinct divergence has taken place in terms of design philosophy. While we have known for a while that the general direction of games has been changing the ratio of raw fill/shader ops in the direction of shaders, this is the first time we have seen a company release a part that went to the extreme of actually reducing what was for a long time the defining raw metric used to define vid cards, texel fill. Old timers will likely recall the old 'fillrate is king' mantra, and while that obviously died off some time ago, the thought never even dawned on me that we would head backwards in that area.

To me that indicates a bet on the direction that games will be taking. Heavy useage of elements that chew up raw fill are going to choke, badly, on the GF100 based parts when compared to their 58xx counterparts. Nothing nVidia does with driver updates or anything else is going to fix that. They have a brick wall they are going to run into, there is no getting around it. *IF* games start to take on a direction where they are far more reliant on shader ops then raw fill tactics, this choice could look very smart in retrospect. If we stay our current course, it won't end up looking to great for nV. They need the current trend to have a sharp spike in it for it to pay any sort of appreciable dividends.

Xtors used for "fillrate" vs xtors used for "shaders".....which one is used in HPC? Shaders right?

Can xtors dedicated to "fillrate" be effiectively used for HPC? If so, how much?

BenSkywalker said:
Up next tesselation. Clearly, nVidia dominates this area. ATi has had tesselation for a long time now, old timers will likely recall TruForm doing some interesting things with CounterStrike back in the days when it was running on Carmack's engine. I think that we had a bit of two elements ending up with the huge disparity in tesselation performance. One, ATi likely thought that with their extensive time and experience having a tesselation engine, they were going to have a superior offering to nV who was too tied up with compute logic. I would have made that same judgement myself. nV probably thought that ATi was going to try and push tesselation hard and wanted to go over the top and try to take their trump card out from under them. In the end, from a consumer point of view, I think the advantage here ends up in ATi's favor because nV's tesselation advantage is so strong that nV blew a lot of transistors on something developers won't be able to use or else they will make it unplayable on anything but a GF100 based part(we may see a few outliers like we do today with PhysX).

I like how Fermi is very strong in tessellation, but how will this affect game development in the future?

Will we see low quality models being tessellated out to high quality models (reducing the need for high speed GDDR)? Or will game developers start of with high quality models and tessellate them out to super quality models?

If "low quality tessellated out to high quality" becomes the standard will we see Nvidia with their own "Fusion chip" or "APU" possibly relying strictly on system memory (even for gaming)? Could such a chip be effectively deployed in HPC to the point where discrete Video card become less desirable? I am under the impression HPC doesn't need high amount of bandwidth in same way Graphics applications do? (Please correct me if I am wrong because I am not in the IT industry and my understanding is quite limited)

BenSkywalker said:
GPGPU- Obviously, utter domination by nVidia. I know most people here don't care, I do quite a bit of video transcoding and there are going to be a lot of CS5 users who find it to be a pretty big deal. A lot of peak throughput numbers get thrown around about how the parts are fairly close, in actual useage it looks like the 480 should be close to twice as fast as the 5870 in bad cases for the 480, somewhere in the 600%-1000% faster in situations better suited for it. A hefty amount of the extra transistors they are carrying is because of this, their layout and the way on chip communication is handled is *far* more CPU like then any GPU we have seen to date. The cache structure and the ability of the different segments of the chip to communicate with each other along with lots of other refinements truly have this on an entriely different level. I don't consider this a slight to ATi at all, they made a choice not to compete in this segment, and in doing so they ended up with a smaller, cooler, earlier chip. Be that as it may, anyone who wants to do anything GPGPU related really has only one viable choice at the moment in the high end segment.

On this same topic, some of the new functionality of the GPGPU elements of this chip actually could be used in games, hardware recursion is an interesting one as it makes ray tracing a realistic possibility in terms of added effects moving forward(not full ray tracing, but hybrid to generate very high quality reflections). These types of features are something that again won't show up in games anywhere close to the near future, but are very interesting from a design perspective as where they are leading us to.

I don't use Adobe CS myself, but I have read rumors that CS5 will be optimized for CUDA? Is this true?

What did CS4 use? Open CL? Or something else?

BenSkywalker · Apr 12, 2010

I can't help but think that the GF100 architecture, while very innovative, is out-of-place in today's game industry.

While I wouldn't disagree with that, it seems a lot like the original GeForce. It too was very out of place with game industry when it released. If 3dfx hadn't been late with the Voodoo5 it wouldn't have looked good in a price/performance perspective either. And then Giants came out and gave us our first look at where the industry was headed and all of a sudden the GeForce made a lot more sense. Now to be fair, the GeForce2 was already out prior to Giants, but if the GeForce hadn't existed, Giants development couldn't have finished up. When looking at what Fermi is, it seems that this is nVidia's long term strategy with this part. Be competitive for what is out today, determine what is out tomorrow.

TWIMTBP titles could be an exception to this rule, but spending money to help develop games seems counter-prodictive to a company that's, as you've mentioned, trying to make money.

In a way, it is the inverse of the console industry. They will lose money on hardware to make money on software. nV has for some time been willing to lose money on software to make money on hardware. That doesn't require a crystal ball to see, they have been doing it for years. While it is an assumption, the fact that they have maintained ~2/3 of the entire discrete market while demanding a price premium for years is likely a good indicator that it is working.

However, what about during the next console generation? Who will be making the GPU for the predecessor to the 360?

The next XBox contract will almost assuredly go to ATi. I think nVidia has a more vested interest in landing the next Playstation contract honestly. Given that they already have a dominant position in PC gaming MS can't afford to completely shut them out. nV landed the contract for the next Nintendo handheld based on currently available information which they may try to leverage for a set top box, but that is highly unlikely as every one of Nin's 3D systems have had their rasterizer/GPU designed by the same team(ATi bought out ArtX who handled the N64). With the market sitting as it is currently, landing in Sony's platform gives them the best potential to influence the broader gaming market(not to mention if current trends hold, the PS3 will end up outselling the 360 by a rather hefty amount).

Edit: The flip-side of this point is that ATI's architecture isn't a great fit for today's gaming industry either. Companies aren't optimizing games to run on ATI's parallel whatsis either, so ATI is being forced to make all these "driver optimizations." Not fun for the software team, and not fun at all for gamers if their stream of driver updates suddenly dries up.

I don't think you have to worry about the driver updates drying up, it is too important to ATi's current overall strategy. This does create other issues as with their driver team focused so heavily on Windows, alternate platforms are going to suffer. You can see this currently by looking at the Linux market; pretty much it's nV or don't bother. Given, that isn't a major platform for gaming, but it is another area that forces some people to buy nV not by underhanded business tactics, but simply due to the fact that they are the only ones servicing that market.

How hard is tessellation to implement into already existing engines or games?

Very easy, but to improve things using it, not so much. Tesselation requires a lot of asset considerations to be implemented in a positive way. If you simply 'turn on' tesselation you can end up with some horribly disfigured models running around, they are so bad on occasion it is actually rather comical. Given that you are talking about a reworking of art assets to do it properly, the odds of it being patched into the game are very slim unless most of the work was done on a title prior to it shipping and they just need to work some bugs out. I'm not saying it isn't possible, but it isn't realistic to expect.

Would it be feasible to augment console ports with high levels of tessellation to take advantage for GF100?

Yes, and the same could be said of the 58xx parts too. While the GF100 parts have a very large advantage in tesselation performance, the 58xx parts can still handle a lot more then their console counterparts. It is within reason to see a TWIMTBP title ship with tesselation levels that will only reasonably run on GF100 parts with a lower setting that can be used on the 460/450/5870/5850 offerings. I would expect those titles to be TWIMTBP and I honestly would think nVidia would want to make sure that tesselation would run on ATi hardware in these cases as it will only make them look good.

Speaking of heat/power/die space, how much does "double precision" affect those totals?

Not as much as you might think, but it has an impact. Overall DP would account for a relatively small amount of the additional xtors. Not trivial, but not a major factor.

Xtors used for "fillrate" vs xtors used for "shaders".....which one is used in HPC? Shaders right?

Yes, although honestly at this point nV's shader hardware is so general purpose their useage of CUDA cores is really more accurate description then hype.

Can xtors dedicated to "fillrate" be effiectively used for HPC? If so, how much?

To put it as simply as possible, no. Fillrate is determined by portions of a chip that share more in common with a DSP then a CPU(you could argue that they are a DSP actually).

extra · Apr 12, 2010

I have a question then hmmm....Is it possible that the lower end cut down parts in the GF100 series won't necessarily lose as much fill rate as you'd think?

Both the companies strategies make sense when you think about it...and there is room for both. AMD wants to tackle the gamer market with a top to bottom lineup of good cards. And they want to be able to scale that same part easily from a high end dual chip card to being able to place that same design in a slimmed down form on the same die as a cpu. Nvidia makes a lot of their profits in the compute sector and as long as they can make a card that is still "good enough" that gamers will buy 'em then that's great...

Both cards seem to have "enough" tessellation power to make things look fantastic. Nvidia's approach seems to be great for the high end cards, but I suspect that at the low end ATI's approach will be better. . .

And you spoke of photoshop cs5...I wonder what all will be accelerated. Wonder how much my 5770 will be able to help. The idea excites me. The idea of say...ACR/Lightroom and DPP using the GPU to be able to fly through raw conversions at many times today's speeds excites me. Same with noise reduction algorithms. I do batch neat image work quite a bit. If that could be GPU accelerated. And surface blur...

. It's going to be a great year.

cbn · Apr 12, 2010

I am having a hard time believing two things: (maybe you can help me)

1. Why ATI wouldn't push tessellation even harder than Nvidia? Wouldn't this help "System on a chip" work better for gaming? (Leaving add-on Discrete cards possibly using shared frame rendering for higher fill-rate/bandwidth situations like Eyefinity)

2. Why Nvidia wouldn't want to produce a large die Fusion/APU of their own? Couldn't this lower costs and improve compute density for HPC? (realize I am a layman trying to understand a very complex situation).

cbn · Apr 12, 2010

Deleted. Double post.

konakona · Apr 12, 2010

Heavy useage of elements that chew up raw fill are going to choke, badly, on the GF100 based parts when compared to their 58xx counterparts.

Clearly looking long term, we are nearing the point of being completely fill 'complete'

Provided that the resolutions on our current and soon to be released displays stay at the same general level of 1080p to 2560x1600 at most barring eyefinity, do you see any other 'consumers' for more abundant fillrate (higher quality AA? is that even necessary any more?)

I remember being quite awed by the DMZG demo when the first geforce came out

Do you know of any similar publicly available showcase of GF100's strength in the gaming context - something revolutionary only can be done with gtx480?

Possibility of adopting even a limited amount of raytracing/tesellation sounds quite intriguing and innovative. If it does happen, how soon do you think? Or would this be also up to what happens in the next gen consoles? (and how far away was that going to be again?)

ViRGE · Apr 12, 2010

BenSkywalker said:
Up next tesselation. Clearly, nVidia dominates this area. ATi has had tesselation for a long time now, old timers will likely recall TruForm doing some interesting things with CounterStrike back in the days when it was running on Carmack's engine. I think that we had a bit of two elements ending up with the huge disparity in tesselation performance. One, ATi likely thought that with their extensive time and experience having a tesselation engine, they were going to have a superior offering to nV who was too tied up with compute logic. I would have made that same judgement myself. nV probably thought that ATi was going to try and push tesselation hard and wanted to go over the top and try to take their trump card out from under them. In the end, from a consumer point of view, I think the advantage here ends up in ATi's favor because nV's tesselation advantage is so strong that nV blew a lot of transistors on something developers won't be able to use or else they will make it unplayable on anything but a GF100 based part(we may see a few outliers like we do today with PhysX).

I suppose this is technically correct, but I can't help but feel it misses the point.

In NVIDIA's vision, tessellation is quite easy to take advantage of. Once you're writing games that only support DX11 and above, you can completely change how you create your models. Instead of creating a very high detail model and then shipping various lower quality derivatives, you create the original high detail model and a single base model. From here you use displacement maps and tessellation to scale that base model up to what the hardware can do. A weak card uses a lower resolution displacement map and tessellation factor, while a high end card uses a high resolution displacement map and high tessellation factor.

The point being that if you go this route, you can very easily scale your assets up to what Fermi can do. Fermi level tessellation is just an even higher level of tessellation and an even higher resolution displacement map, both of which can easily be generated from your original high quality asset. Plus it works well with AMD hardware too - AMD may not be able to match NV's tessellation performance, but you only have to use a slightly lower quality level to achieve reasonable performance. This is quite different from PhysX where the fallback mechanism is a massive dropoff in performance in moving from GPU physics to CPU physics.

So if developers go this route, I disagree that NV won't be allowed to flex their tessellation muscles (and in reality, even if you use today's bolt-on methods, all of this still applies). As long as your assets are of high enough quality in the first place, then it's going to be very easy to take advantage of Fermi's extra tessellation abilities. Tessellation is ultimately scalable, there's no risk in creating something is only of value on a limited subset of hardware, because you can always scale it down.

NVIDIA's biggest risks are that developers don't create assets high enough in quality (which from what I understand is rare, since the high quality is necessary for SSAO, bumpmapping, and such) or that developers refuse to include higher quality tessellation maps due to a desire to keep game sizes down.

MarcVenice · Apr 12, 2010

I've 2 problems with the first post. You claim Nvidia is much faster with tesselation then AMD. But it has been said before, that this only holds true when the gpu only has to do tesselation. When we look at games like Metro 2033 and DiRT 2, which use tesselation, we see both gpu's lose quite a bit of performance. Even though Nvidia is as you claim, a lot better in tesselation? I claim it's not, only it's theoretical throughput of tesselation destroy's ATI-gpu's, but in actual games it levels out.

You also claim Nvidia has an advantage in the gpgpu-sector. They are clearly pushing it, and CUDA is obviously far more advanced then anything ATI has to offer. We haven't really had the oppurtunity to compare both gpu's on a fair level though. I've yet to see numbers between said gpu's, running something through OpenCL.

So I think Nvidia has the advantage in gpgpu-computing, on a software level. And yes, I also predict their architecture is better suited for it then ati's, but I have yet to see it. That could be taken two ways really, of which the worst for ATI would be, that Nvidia is doing something for gpgpu-users (scientists etc) and ATI isn't. Which would make any comparisons kind of moot, since ATI doesn't care?

I think you should also take the market into consideration. Nvidia sells gpu's to both gamers, but also to the professional market. Their architecture seems to show that they are focusing more on professionals. What if GF100 is the next step. It's a hybrid for gaming and gpgpu-apps. They need the high amount of sales gamers provide, and them make real profit selling $1500 cards to professionals. GF200 is gpgpu only? Because obviously the extra transistors for gpgpu cost a lot of money, and suck power in the end. And a few watts might not be a big deal for a gamer, if a company uses 5.000 cards for paralellcomputing, a few watts will make a difference ...

Skurge · Apr 12, 2010

One review I read where the 480 destroyed the 5870 in the stalker sun shafts test said it used a form of ray-tracing.

Could someone tell me if thats true or not.

MarcVenice · Apr 12, 2010

Skurge said:
One review I read where the 480 destroyed the 5870 in the stalker sun shafts test said it used a form of ray-tracing.

Could someone tell me if thats true or not.

False, I thought it was a april 1st joke...

nosfe · Apr 12, 2010

Computer Bottleneck said:
I don't use Adobe CS myself, but I have read rumors that CS5 will be optimized for CUDA? Is this true?

What did CS4 use? Open CL? Or something else?

Only Premiere Pro uses CUDA, Photoshop is still using just OpenGL. Also, Adobe said that they wanted to implement OpenCL but it wasn't ready for prime time when they started working on CS5 so i wouldn't be surprised if they switched to OpenCL for CS6

Voo · Apr 12, 2010

Nice post, good, but easy to understand explanations of the differences.

BenSkywalker said:
With ATi belonging to AMD, I'm not sure that ATi is going to be interested in heading down the same path that nV is at all. Not saying that they won't, but if they can avoid GPGPU ever taking off, it helps their CPU division while hurting one of their major competitors. Of course, they have the desireable position of also being able to adapt rather quickly and respond if consumer demand starts showing an increased interest level. I would imagine that they would like to at least increase their presence in the segment before Intel fixes the mess that is Larrabee and manages to get something workable out of the design, but that could be years from now.

Well the whole reason why AMD bought Ati was to get some competive advantages against Intel in a field where both of them had no real knowledge. Just look how much they invested in Fusion and how heavily they promote it.
Intel also seems to have understood how important that market will be - and shown how hard it is to develop large GPUs.. hi larabee.

Other than that I think I can add a little bit to the GPGPU part, especially what's one of the advantages of the Nvidia architecture: As a lot people here will already know, GPUs are probably the best example for a SIMD architecture (http://en.wikipedia.org/wiki/Flynn's_taxonomy). As the name implies you have one instruction (let's say add) and multiple data streams on whom that instruction is executed.
So as you see a SIMD architecture is perfect for parallel data processing, but it's very easy to hinder the performance vastly.
One example:
for(i = 0; i < N; i++)
if(x < y) z = x + y
else
z = x - y

To compute that on a SIMD architecture you:
1. compute the predicates (x < y) for all threads
2. all those threads that fulfill the predicate compute something, the other do nothing
3. repeat 2 but this time for the threads that don't fulfill the predicate

Especially bad if the predicate is only false for a minority of threads. So what does Nvidia about this? They group threads into groups and have active flags for those groups. E.g. if the predicate is identical for all threads in a group they won't be scheduled if unncessary.

So while conditional execution is always a bad idea on any SIMD machine, where it cannot be avoided we get programming convenience at the cost of hardly predictable loss of efficiency (if we're unlucky one thread in every group has to compute the opposite of the rest)..

Genx87 · Apr 12, 2010

I get this feeling Fermi is like the original Geforce 256 and NV30. Two designs that at the time werent spectacular performance wise but were ahead of their time from a feature perspective. The dividends paid down the road from those two designs back to Nvidia were huge. However consoles were different back then. I fear while Nvidia is pushing this forward, game developers are being held back by 5 years old technology in a console. So on the PC gaming side we wont see the fruits for years. Unless we get aggressive devs or Nvidia does it themselves via TWIMTP.

I personally believe Fermi is a new direction for Nvidia. Realizing they wont be in the CPU business, and Intel and AMD are going to attempt to build a unified platform with decent enough graphics performance and the trend being console ports. The high end PC gaming scene is looking bleak. To supplement their income, HPC is a new target. One that AMD and Intel have a vested interest in not attacking with their GPUs because their CPU divisions would melt. Only time will tell if this works. But I think they have a very compelling product with Fermi. There is nothing out there that will give the performance\watt in HPC that a Tesla will provide.

Now onto fillrate vs shaders. Couldnt these GPUs build out a fillrate that gives say 60fps at any resolution and build a shader backend to hit that fps? I know it sounds weird to say this but wouldnt we rather have 60max\min fps at a given resolution than 120fps max, 30fps min?

Fox5 · Apr 12, 2010

OpenCL applications (such as Dolphin, which I'd like my rig to run at faster speeds) could also be good news, since OpenCL runs on both companies' cards.

Dolphin supports OpenCL, but doesn't really use it for anything. It's used to speed up texture loading, but the time it takes to load an opencl kernel is usually greater than texture loading took in the first place. Dolphin is heavily cpu limited, you need an intel i series processor at high clock speeds to handle it.

Who will be making the GPU for the predecessor to the 360? If it's nVidia, then the GF100 architecture could get the last laugh... Even if it doesn't make a lot of sense for playing today's games on. Still, I doubt that the 480 would be capable of playing the next generation's games even if the nextBox's architecture is Fermi-based.

Rumor has it that ATI will be doing it again. However, I wouldn't count on next gens consoles fairing as well as this gen, video cards are rapidly reaching power and heat levels that a console can't touch.

How hard is tessellation to implement into already existing engines or games? Would it be feasible to augment console ports with high levels of tessellation to take advantage for GF100?

It's apparently not all that hard, but it requires a game to target dx11 when most console ports target dx9. (which I think is what the xbox 360 dev kit automatically outputs, so a dx11 path requires additional effort) It also requires higher quality assets than exist in the game for the tessellation to work well.

There's a limit to the effectiveness of tessellation however. In game models are already pretty well optimized for polygon counts, and shaders already give the illusion of depth. Alien Versus Predator isn't very aggressive with tessellation, but already achieves polygons near the size of pixels, I don't think you'll go far beyond that. The bigger issue is probably reducing aliasing and shimmering, a high poly mesh can be anti-aliased and filtered better than a very large and complex textured/shader surface. The top end Fermi appears to have ~5x the tesselation performance of 5870, but polygon counts have already reached the level of diminishing returns, and a game developer would be crazy to put out something a 5870 can't handle.

I don't use Adobe CS myself, but I have read rumors that CS5 will be optimized for CUDA? Is this true?

What did CS4 use? Open CL? Or something else?

It's true. CS4 used cpu for most things, opengl for a few effects, and had a few CUDA plugins. IIRC, nvidia limited only quadro cards to working with the CUDA effects, and I think the same may happen with CS5.

When looking at what Fermi is, it seems that this is nVidia's long term strategy with this part. Be competitive for what is out today, determine what is out tomorrow.

The original Radeon and Radeon 8500 outpaced nvidia in features. Even 9700 pro did until Geforce FX launched, late, hot, and power hungry. From that perspective, the company that overreaches with features stumbles.

I don't think you have to worry about the driver updates drying up, it is too important to ATi's current overall strategy. This does create other issues as with their driver team focused so heavily on Windows, alternate platforms are going to suffer. You can see this currently by looking at the Linux market; pretty much it's nV or don't bother. Given, that isn't a major platform for gaming, but it is another area that forces some people to buy nV not by underhanded business tactics, but simply due to the fact that they are the only ones servicing that market.

Linux 3d built up around nvidia, and so apps rely heavily on nvidia proprietary extensions, as well as the graphics framework nvidia laid out. (nvidia overwrites a large part of the linux graphics stack, because it sucked back when nvidia originally developed their driver)

Even though Nvidia is as you claim, a lot better in tesselation? I claim it's not, only it's theoretical throughput of tesselation destroy's ATI-gpu's, but in actual games it levels out.

Games don't use high levels of tessellation yet, nvidia's demos and the uniengine benchmark do. IMO, tessellation is a waste after a certain level, using it at all can provide an order of magnitude of difference, but it's going to get hard to tell the difference after that.

You also claim Nvidia has an advantage in the gpgpu-sector. They are clearly pushing it, and CUDA is obviously far more advanced then anything ATI has to offer. We haven't really had the oppurtunity to compare both gpu's on a fair level though. I've yet to see numbers between said gpu's, running something through OpenCL.

Nvidia does seem to have a big advantage. Not that things can't run fast on ati (perhaps even faster than fermi), but more types of code will run fast on fermi. But by the time it takes off, ati might be able to produce their own fermi like architecture.

Madcatatlas · Apr 12, 2010

Fox5 said:
Rumor has it that ATI will be doing it again. However, I wouldn't count on next gens consoles fairing as well as this gen, video cards are rapidly reaching power and heat levels that a console can't touch.

I quote this because I want to point out that the reverse SHOULD be the case when talking heat, and ATI has managed this while nVidia hasnt.

Consoles will have their own versions of gpu architecture, like always.

cbn · Apr 12, 2010

Genx87 said:
I fear while Nvidia is pushing this forward, game developers are being held back by 5 years old technology in a console. So on the PC gaming side we wont see the fruits for years. Unless we get aggressive devs or Nvidia does it themselves via TWIMTP.

It would be nice to see Nvidia making a profit on those TWIMTBP games. Then maybe we would see more of them?

Genx87 said:
The high end PC gaming scene is looking bleak.

True.

Genx87 said:
Now onto fillrate vs shaders. Couldnt these GPUs build out a fillrate that gives say 60fps at any resolution and build a shader backend to hit that fps? I know it sounds weird to say this but wouldnt we rather have 60max\min fps at a given resolution than 120fps max, 30fps min?

Sounds interesting. In the Anandtech "Exploring Input Lag inside and out" article, the author is makes a point that pixel shaders are the bottleneck in the Video Cards calculation speed.

cbn · Apr 12, 2010

Fox5 said:
Rumor has it that ATI will be doing it again. However, I wouldn't count on next gens consoles fairing as well as this gen, video cards are rapidly reaching power and heat levels that a console can't touch.

The fact that PC Video cards are so much more powerful than the Console GPUs is probably part of the problem.

1080p is becoming the standard in LCD TVs these days.

This begs the question: Why do PC video cards need to be so much more powerful than a Console? Are 2560x1600 LCD users the only ones who will want a discrete card?

BTRY B 529th FA BN · Apr 12, 2010

So who holds the key to where the architectural designs should flow to. I see nVs move as to a foreseen future guess while ATI pushes their architect to minimize design cost while pushing the areas needed that today's arena of gamers demand. Truth behind most of it is $$$ is the bottom line that dictates the poker move.

Great OP btw, thanks much!!! It's a subject that will be more educational than anything else

blanketyblank · Apr 12, 2010

Even if games started using the extra transistors that NV has packed into their cards, consumers aren't necessarily going to buy them since those effects are rather small and the price differential could be pretty big. Nintendo can still sell a lot of games and consoles even though it's hardware is inferior to Microsoft and Sony. Mainstream consumers are willing to live with inferior quality as long as it meets a certain threshold, and honestly who would care how realistic the reflections are on your gun in a fps if it costs you a lot more money for the hardware or reduces your fps to a crawl.

Also once you factor in price, the comparison between the two starts changing since a single 480 costs about the same as 2 x 5850s. Thus whatever performance difference NV can make due to its extra transistors could be make up for by using multiple GPUs. This seems especially important for things like tesselation since 2 x 5770s far outpace a single 5870 in heaven's benchmark even though performance is very close in games.

Thus if money is no object and developers started to make use of their cards then NV's strategy would pay off, but realistically the only benefit to their new architecture is selling to the profitable HPC market. If they really wanted the consumer gaming market a better strategy is paying off or buying game developers to make games solely for their hardware like Microsoft, Sony, and Nintendo do.

BenSkywalker · Apr 12, 2010

Provided that the resolutions on our current and soon to be released displays stay at the same general level of 1080p to 2560x1600 at most barring eyefinity, do you see any other 'consumers' for more abundant fillrate (higher quality AA? is that even necessary any more?)

SSAA chews up a lot of fillrate. I think that moving forward nV isn't going to continue to drop fillrate levels, in fact I expect them to increase it again, just with the more slanted shift toward shader power and with a continued faster acceleration of shader power in relation to raw fill.

I have a question then hmmm....Is it possible that the lower end cut down parts in the GF100 series won't necessarily lose as much fill rate as you'd think?

It is possible, but the relative shader to fill performance ratio isn't likely to change much due to how the chips are designed.

I suppose this is technically correct, but I can't help but feel it misses the point.

I don't disagree with anything you posted, pretty much what I was getting at is that nVidia will end up with their high end parts running a level of tesselation that the ATi parts won't be using which in terms of charts isn't going to help them out much. We have seen reviewers marginalize the effects of PhysX which in certain titles is rather dramatic, they will certainly marginalize any advantage that they have using tesselation models with higher levels of complexity as the differences will be far more subtle.

I've 2 problems with the first post. You claim Nvidia is much faster with tesselation then AMD. But it has been said before, that this only holds true when the gpu only has to do tesselation. When we look at games like Metro 2033 and DiRT 2, which use tesselation, we see both gpu's lose quite a bit of performance.

http://www.anandtech.com/show/2977/...x-470-6-months-late-was-it-worth-the-wait-/16

Until fillrate starts to become an issue, the 480 is tied with the 5970, the 470 is faster then the 5870(CPU limited at ~126FPS in this title) in DiRT2. Another couple examples of the tesselation example at work, although it is pure synthetics it gives an example of the strength of the parts-

http://ixbtlabs.com/articles3/video/gf100-2-p11.html

If you look at the water tesselation test which is also running a reasonably complex shader routine the ATi parts suffer much steeper drop offs then the 480. You can tell be looking at the other synthetic tests that it is clear their theoretical edge in pure tesselation isn't as pronounced when it is running shader code on top of tesselation, but even when utilizing both the GF100 core is still clearly dominating in tesselation performance(that is backed up in Dirt2 also until fillrate limitations start to come in to play).

We haven't really had the oppurtunity to compare both gpu's on a fair level though. I've yet to see numbers between said gpu's, running something through OpenCL.

http://www.anandtech.com/show/2977/...tx-470-6-months-late-was-it-worth-the-wait-/6

First two benches are OpenCL. The 5870 is still bested by a decent amount by a 285, not in the same league as the 480. I don't think this is really going to surprise anyone though, nV spent a hell of a lot of resources to make sure it happened this way, and are paying the penalties for it in their end product(size/heat/noise).

Because obviously the extra transistors for gpgpu cost a lot of money, and suck power in the end. And a few watts might not be a big deal for a gamer, if a company uses 5.000 cards for paralellcomputing, a few watts will make a difference ...

While I agree with that entirely, I would also point out that the performance per watt on Fermi still decimates anything else available at this point in time. In no way am I discounting industry wanting to conserve power on these parts, but 1 Fermi at 300 watts is still a lot better then 20 i7s even at 30 watts each(would really be more like 50-100 i7s at ~90 watts each).

It's true. CS4 used cpu for most things, opengl for a few effects, and had a few CUDA plugins. IIRC, nvidia limited only quadro cards to working with the CUDA effects, and I think the same may happen with CS5.

Right now they are saying GTX285, any Fermi product or Quadros. Not sure if you'll be able to get it to work on other GTX2xx parts or not, but they are supporting at least some GeForce SKUs this time.

The original Radeon and Radeon 8500 outpaced nvidia in features. Even 9700 pro did until Geforce FX launched, late, hot, and power hungry. From that perspective, the company that overreaches with features stumbles.

The GeForce had a staggering feature lead over everything else when it launched, the Radeon was very close to feature support of the original GeForce. Those two parts can be viewed as the offerings that killed off the entire rest of the market. Not long after their respective launches, every other play in the market folded up or was marginallized never to be heard from in a competitive manner again.

There's a limit to the effectiveness of tessellation however. In game models are already pretty well optimized for polygon counts, and shaders already give the illusion of depth. Alien Versus Predator isn't very aggressive with tessellation, but already achieves polygons near the size of pixels, I don't think you'll go far beyond that.

I don't disagree with that at all, but would point out that right now it is still mainly character models, environments are still rather lacking in geometric complexity.

Rumor has it that ATI will be doing it again. However, I wouldn't count on next gens consoles fairing as well as this gen, video cards are rapidly reaching power and heat levels that a console can't touch.

Not disagreeing with this at all, but Fermi on 22nm would almost certainly be a much stronger part for consoles then 5870 on 22nm(I expect ATi to have a completely different architecture in place by then, just pointing out that a process change can cure a lot for a console part and yes, I know you know that, pointing it out for others reading the thread

).

Now onto fillrate vs shaders. Couldnt these GPUs build out a fillrate that gives say 60fps at any resolution and build a shader backend to hit that fps?

With how much OD and how much foliage? There in lies the problem. Every part released today, even the lowest end offerings, have the raw fillrate to handle a basic scene even on a 30" at 60FPS without trouble. When you start adding up the overdraw and all the extra effects using multiple layers then things start to get tricky.

So who holds the key to where the architectural designs should flow to. I see nVs move as to a foreseen future guess while ATI pushes their architect to minimize design cost while pushing the areas needed that today's arena of gamers demand.

And this is the big gamble. Watching how it all plays out both short term and long term will determine if bets were placed properly. It is well within reason that both companies made the proper choices, if both ATi and nV remain profitable and reach the goals they were shooting for then they both did their part. Having solutions targetting more immediate needs and those that are skewing more towards future demands are both areas that the market has proven it supports in the past. I don't expect that that has changed.

Thus if money is no object and developers started to make use of their cards then NV's strategy would pay off, but realistically the only benefit to their new architecture is selling to the profitable HPC market.

I find absolutely nothing negative about your post at all, but I'm going to avoid getting into those discussions as that can derail the intent of the thread fairly quickly. Clearly your post was well written and thought out, just saying that that line of thought can get into the partisan bickering fairly easily

Fox5 · Apr 12, 2010

Computer Bottleneck said:
The fact that PC Video cards are so much more powerful than the Console GPUs is probably part of the problem.

1080p is becoming the standard in LCD TVs these days.

This begs the question: Why do PC video cards need to be so much more powerful than a Console? Are 2560x1600 LCD users the only ones who will want a discrete card?

Consoles are still often limited to below HD resolutions, and even then their graphics are poor compared to PCs.
Until the standard for consoles is 1080p @ 60fps, PCs will be quite a bit ahead, and there will always be higher levels of texture and effect quality for PCs to push.

ugaboga232 · Apr 12, 2010

What about MilkyWay@Home? Doesn't that get ridiculous numbers for ati? Isn't folding at home blatantly unoptimized for newer ati hardware? Isn't Nvidia's huge lead in Metro 2033 due to an AA bug? Is Unigine anything other than a controversial benchmark at this point?

Architectural Direction of GPUs

Diamond Member

Senior member

Diamond Member

Lifer

Diamond Member

Golden Member

Lifer

Lifer

Diamond Member

Elite Member, Moderator Emeritus

Moderator Emeritus <br>

Diamond Member

Moderator Emeritus <br>

Senior member

Golden Member

Lifer

Diamond Member

Golden Member

Lifer

Lifer

Lifer

Golden Member

Diamond Member

Diamond Member

Member