Anyone care to revisit an old discussion?

SlowSpyder · Oct 14, 2009

I remember several times it was mentioned here during forum discussions that AMD needs to change their SP's/core architecture as it is not efficient enough. I always thought that the way AMD did things was fine since they give you almost 100% of the performance with 60% of the silicon.

One of the arguements that was brought up over and over is how the GT200 GPU's are so much larger due to the fact that they are also aimed to be used as GPGPU processors in addition to being standard GPU's, so there is a lot of extra silicon dedicated to those GPGPU functions.

I had asked in a few threads to be shown what exactly was used for GPGPU functionality in the GT200, but never really got an answer.

With the upcoming launch of the Fermi/GT300, some details about the new chip are starting to come forward. One thing we know is that with this new GPU Nvidia is truely trying to create and make a move into the GPGPU world in earnest. If we take a look at this link (tablet in the middle especially), we see a lot of changes from the GT200 architecture to Fermi that I understand are there for GPGPU functionality. But when looking at the G80 vs. GT200, there aren't many changes at all.

I know in the GT200 days Nvidia liked to talk up GPGPU ability and promote it, but I hadn't seen what they did at the silicon level to tweak the architecture for it to be truely aimed at GPGPU more so than the G80/G92 GPU's before it. It seems like it was just talked about more and a push in that direction was made at that time more so than the silicon envisioned and built with exceling at those GPGPU functions in mind. Basically they marketed it like the GT200 was built with that functionality in mind, but to me it seems more like they just created apps to work with the GPU's they had.

So I guess this brings us to Fermi (and the RV870). What about Fermi makes us think that it should be faster then a 5870 by a larger percentage then a GTX285 is faster than a 4890 (other than then the fact that AMD more or less doubled everything, Nvidia slightly more than doubled SP's by going 240 -> 512). Also, if we look at AMD's 5870 architecture (here) and (here) you'll see that AMD also tweaked their architecture in a few areas.

From the PC Perspective article, Nvidia has gone from 16KB L1 cache/shared memory to a 'configurable' 48KB. From the AT article AMD has kept the L1 texture cache at the same 16KB as the prior gen, but increased the speed (1TB/sec) as well as added a seperate 8KB L1 cache for the SIMD's and doubling the local data share to 32KB (from the prior gen). From PC Perspective, Nvidia is adding a seperate 768KB of L2 cache, from the AT article, AMD has 4 seperate L2 caches that they are doubling in size to 128KB from the prior gen.

I won't lie and pretend like I know how all of this will work differently in the different architectures, but again it looks like Nvidia will talk up GPGPU, but I'm not sure if AMD is really far behind them on a silicon level and they just need to push the software, or if Nvidia really is better optimized for GPGPU. Also, I'm not convinced that Nvidia had all of that silicon for GPGPU functions with the GT200, I think it's quite an accomplishment that AMD was able to give similar performance (though with more clock speed) with so much less silicon.

I figured at the very least this would at least make for some good discussion if anyone wants to jump in and tell me why I'm wrong... or better yet why I'm right.

Cliffs:
AMD GPU's are smaller than Nvidia GPU's but performance is similar.
Are Nvidia GPU's really more aimed towards GPGPU?

thilanliyan · Oct 14, 2009

Is Fermi actually MIMD like it was rumoured?

SlowSpyder · Oct 14, 2009

Originally posted by: thilanliyan
Is Fermi actually MIMD like it was rumoured?

Unless I'm totally off on this (could be) wasn't the point of the MIMD to be for increasing the double precision floating point performance? Or was there more to it than that, or was that not the case at all? If that is the case, I'll assume Nvidia did go to MIMD since looking at the PC Perspective table shows a very large increase in double precision performance over the GT200.

Again, unless I'm wrong - and I could be - double precission performance matters more in GPGPU than games, hence the reason Nvidia would like to increase performance in that area.

Wreckage · Oct 14, 2009

Originally posted by: SlowSpyder
AMD GPU's are smaller than Nvidia GPU's but performance is similar.

Are Nvidia GPU's really more aimed towards GPGPU?

Not really. NVIDIA was faster in games than ATI, in fact the GTX295 is still the fastest card available. The HD4xxx series was always behind.

The extra GPGPU capability is just icing on top of the gaming cake. Look how well Batman AA plays when PhysX is enabled.

With many games playable on even mid range cards, you need to offer your customers something more. I think this is why NVIDIA outsells ATI 2 to 1.

v8envy · Oct 14, 2009

One word: unified memory model. This has absolutely nothing to do with with gaming performance, but is going to make programming the GPU in C and C++ worlds easier. A flat-looking address space is far easier to grok than managing multiple heaps. Heck, a really bored someone might even port a JVM to that beast.

In any case, I can also see it not being free to implement in the memory manager, silicon-wise. Ditto for ECC.

I haven't looked at OpenCL or CUDA, but I'd hazard a guess the feature set of NV's language for Fermi is far, far richer and easier to use. And easier for compilers to produce optimized code for.

SlowSpyder · Oct 14, 2009

Originally posted by: Wreckage

Not really. NVIDIA was faster in games than ATI, in fact the GTX295 is still the fastest card available. The HD4xxx series was always behind.

The extra GPGPU capability is just icing on top of the gaming cake. Look how well Batman AA plays when PhysX is enabled.

With many games playable on even mid range cards, you need to offer your customers something more. I think this is why NVIDIA outsells ATI 2 to 1.

I get it now. Any time any possible beloved patriot in Nvidia's armor is brought up you try and deflect then derail.

I was hoping to have a decent discussion on this by some more knowlegeable members that could tell me why Nvidia's GPGPU superiority is just perceived, or very much real. But I see you are already in derail mode since it's possible that Nvidia will be shown in a bad light.

v8envy · Oct 14, 2009

Originally posted by: Wreckage

The extra GPGPU capability is just icing on top of the gaming cake. Look how well Batman AA plays when PhysX is enabled.

Yes, we get it, there is one title NV got the developer to criple on ATI hardware. Techies have demonstrated that it runs just peachy on 5870s, with or without an NV card for PhysX card when blocks are hacked around. Completely artificial developer implemented lock-out. It would be trivial to add code to have any game run like garbage if an NV card is detected as well. We're all hoping game development doesn't head down that road.

SlowSpyder · Oct 14, 2009

Originally posted by: v8envy
One word: unified memory model. This has absolutely nothing to do with with gaming performance, but is going to make programming the GPU in C and C++ worlds easier. A flat-looking address space is far easier to grok than managing multiple heaps. Heck, a really bored someone might even port a JVM to that beast.

In any case, I can also see it not being free to implement in the memory manager, silicon-wise. Ditto for ECC.

I haven't looked at OpenCL or CUDA, but I'd hazard a guess the feature set of NV's language for Fermi is far, far richer and easier to use. And easier for compilers to produce optimized code for.

Thanks! Can you expand on the unified memory model? I'm just not too familiar with programing 'stuff'.

Wreckage · Oct 14, 2009

Originally posted by: v8envy

Originally posted by: Wreckage

The extra GPGPU capability is just icing on top of the gaming cake. Look how well Batman AA plays when PhysX is enabled.

Click to expand...

Yes, we get it, there is one title NV got the developer to criple on ATI hardware. Techies have demonstrated that it runs just peachy on 5870s, with or without an NV card for PhysX card when blocks are hacked around. Completely artificial developer implemented lock-out. It would be trivial to add code to have any game run like garbage if an NV card is detected as well. We're all hoping game development doesn't head down that road.

:roll:

I'm assuming you are talking about anti-aliasing, while I am talking about PhysX as it relates to GPGPU and how that can apply to gaming.

v8envy · Oct 14, 2009

Originally posted by: SlowSpyder

Thanks! Can you expand on the unified memory model? I'm just not too familiar with programing 'stuff'.

I could, but you'd find exactly what you're looking for here: http://www.nvidia.com/content/...putingArchitecture.pdf

TL;DR version: it sounds like it'll be possible for kernels to view memory as a flat address space and not care where it's thread local, shared, cached, or out over the PCIe bus. The memory manager will do the heavy lifting as opposed to the programmer or compiler.

v8envy · Oct 14, 2009

Originally posted by: Wreckage

I'm assuming you are talking about anti-aliasing, while I am talking about PhysX as it relates to GPGPU and how that can apply to gaming.

No, I'm talking about ATI's hardware being just as capable at running PhysX (if it were implemented on top of OpenCL rather than CUDA) as NVs, which is what this thread is about.

There was even a port of PhysX to ATI's earlier hardware.

TL;DR: hardware just as good, just much smaller. Lack of functionality is completely a software issue.

Wreckage · Oct 14, 2009

Originally posted by: v8envy

I'm assuming you are talking about anti-aliasing, while I am talking about PhysX as it relates to GPGPU and how that can apply to gaming.

I doubt it.

Look at Folding@home for example. NVIDIA out performs ATI by a large margin.

Video encoding is another example. ATI still relies heavily on the CPU.

Like I said it was a good move by NVIDIA to not only provide the fastest gaming but also the fastest and most utilized applications.

Vertibird · Oct 14, 2009

Originally posted by: Wreckage

Originally posted by: SlowSpyder
AMD GPU's are smaller than Nvidia GPU's but performance is similar.

Are Nvidia GPU's really more aimed towards GPGPU?

Click to expand...

Not really. NVIDIA was faster in games than ATI, in fact the GTX295 is still the fastest card available. The HD4xxx series was always behind.

The extra GPGPU capability is just icing on top of the gaming cake. Look how well Batman AA plays when PhysX is enabled.

With many games playable on even mid range cards, you need to offer your customers something more. I think this is why NVIDIA outsells ATI 2 to 1.

If PhysX is so great why don't they let customers mix cards on mobos? (ie, we buy ATI cards for the superior gaming value, but then mix it with a Nvidia card for Physx)

Or maybe Nvidia doesn't want us to use Lucid hydra? But why would they care since they don't even want to make chipsets anymore?

Wreckage · Oct 14, 2009

Originally posted by: Vertibird
[
If PhysX is so great why don't they let customers mix cards on mobos?

You can blame Vista for that. NVIDIA probably does not want to have to deal with the headache of incompatibility issues.

thilanliyan · Oct 14, 2009

Well this thread went down the drain real quick. And it might have actually had some interesting discussions.

Schmide · Oct 14, 2009

I haven't completely figured out what total improvements the g300 will make other than optimizations to make it perform better.

The lack of any real program flow control without a major penalty is still a major limiting factor for all GPGPUs, where Larrabee will fall short in pure throughput of calculations, it will shine in this area.

Nice derailing of the thread Weakage.

EDIT: Excellent post SlowSpyder

evolucion8 · Oct 14, 2009

Lets just ignore Crapckage along with all his cheap marketing tantrum.

I'm quite interested in the discussion. Using SiSandra I found out the following.

I know that the tests are synthetic, but it may explain why nVidia cards aren't much faster than ATi cards regarding media encoding or workload in which isn't bandwidth bound.

In float shader performance, my card is barely faster than a GTX 280, in double shader performance, my card is more than 4 times faster than a GTX 280, which means that in computation power, nVidia has fewer execution resources, but its more efficient with the use of it than ATi's architecture. In double precision, GTX architecture is a pita.

In Internal memory bandwidth, the cache hierarchy plays an important role and even though ATi has a Local Data Share (Something that GTX card lacks), the GTX 280 is almost 3 times faster than my HD 4870 in Internal Memory Bandwidth, but funny enough, in Data Transfer Bandwidth tests which is System to Device Bandwidth, the HD 4870 is twice faster than the GTX 280. So in the end, the GTX series architecture which is more oriented to Thread Encapsulation, proves that is more efficient maximizing the resources usage and loves lots of small threads, while the HD 4x00 architecture loves to have huge amount of threads (An unelegant approach). The results are below.

Float Shaders Performance

HD 4870 448.71
GTX 280 434.92

Double Shaders Performance

HD 4870 222.25
GTX 280 54.33

Internal Memory Bandwidth

HD 4870 46.200GB/s
GTX 280 113.206GB/s

Data Transfer Bandwidth

HD 4870 4.336GB/s
GTX 280 2.664GB/s

SlowSpyder · Oct 14, 2009

Originally posted by: Wreckage

Originally posted by: v8envy

I'm assuming you are talking about anti-aliasing, while I am talking about PhysX as it relates to GPGPU and how that can apply to gaming.

Click to expand...

I doubt it.

Look at Folding@home for example. NVIDIA out performs ATI by a large margin.

Video encoding is another example. ATI still relies heavily on the CPU.

Like I said it was a good move by NVIDIA to not only provide the fastest gaming but also the fastest and most utilized applications.

As I understand it, F@H is better on Nvidia as it just runs on their SP's and works. With AMD there'd have to be tweaking done to make use of AMD's one 'big' SP coupled with 4 'small' SP's. We really won't have a fair picture until that tweaking is done.

AMD uses both the CPU and GPU, from an end user perspective it probably works better. The job can be broken up for where the parts that will fare better on a GPU get sent there, parts that would be better suited for CPU are sent there. Remember, a PC is a platform of components, not just a video card. Using all of those parts to get the job done the most efficiently is best, in my opinion.

It is a good move for Nvidia to branch out, as it looks now, they won't survive long if they stick to discreet GPU's. From what I've been reading, the direction AMD and Intel are going will basically do away with discreet GPU's at some point. As far as who's faster, well that depends on the price point. A 5780 is generally as fast as a GTX295, yet cheaper for example. There really aren't many price points that Nvidia has the better card from a performance perspective.

Idontcare · Oct 14, 2009

Originally posted by: SlowSpyder
Cliffs:
AMD GPU's are smaller than Nvidia GPU's but performance is similar.

It seems to me that you are assuming both architectures have been equally optimized in their respective implementations when making comparisons that involve things like die-size.

Let me use an absurd example to show what I mean.

Suppose NV's decision makers decided they were going to fund GT200 development but gave the project manager the following constraints: (1) development budget is $1m, (2) timeline budget is 3 months, and (3) performance requirements were that it be on-par with anticipated competition at time of release.

Now suppose AMD's decision makers decided they were going to fund RV770 development but gave the project manager the following constraints: (1) development budget is $10m, (2) timeline budget is 30 months, (3) performance requirements were that it be on-par with anticipated competition at time of release, and (4) make it fit into a small die so as to reduce production costs.

Now in this absurd example the AMD decision makers are expecting a product that meets the stated objectives, and having resourced it 10x more so than NV did their comparable project, one would expect the final product to be more optimized (fewer xtors, higher xtor density, smaller die, etc) than NV's.

In industry jargon the concepts I am referring to here are called R&D Efficiency and Entitlement.

Now of course we don't know whether NV resourced the GT200 any less than AMD resourced the RV770, and likewise for Fermi vs. Cypress, but what we can't conclude by making die size comparisons and xtor density comparisons is that one should be superior to the other in those metrics without our having access to the necessary budgetary informations that factored into the project management aspects of decision making and tradeoff downselection.

This is no different than comparing say AMD's PhII X4 versus the nearly identical in die-size Bloomfield. You could argue that bloomfield shows that AMD should/could have implemented PhII X4 as a smaller die or they should/could have made PhII X4 performance higher (given that Intel did)...or you could argue that AMD managed to deliver 90% of the performance while only spending 25% the coin.

It's all how you want to evaluate the metrics of success in terms of entitlement or R&D efficiency (spend 25% the budget and you aren't entitled to expect your engineers to deliver 100% the performance, 90% the performance is pretty damn good).

So we will never know how much of GT200's diesize is attributable to GPGPU constraints versus simply being the result of timeline and budgetary tradeoffs made at the project management level versus how similar tradeoff decisions were made at AMD's project management level.

evolucion8 · Oct 14, 2009

Originally posted by: SlowSpyder
As I understand it, F@H is better on Nvidia as it just runs on their SP's and works. With AMD there'd have to be tweaking done to make use of AMD's one 'big' SP coupled with 4 'small' SP's. We really won't have a fair picture until that tweaking is done.

AMD uses both the CPU and GPU, from an end user perspective it probably works better. The job can be broken up for where the parts that will fare better on a GPU get sent there, parts that would be better suited for CPU are sent there. Remember, a PC is a platform of components, not just a video card. Using all of those parts to get the job done the most efficiently is best, in my opinion.

It is a good move for Nvidia to branch out, as it looks now, they won't survive long if they stick to discreet GPU's. From what I've been reading, the direction AMD and Intel are going will basically do away with discreet GPU's at some point. As far as who's faster, well that depends on the price point. A 5780 is generally as fast as a GTX295, yet cheaper for example. There really aren't many price points that Nvidia has the better card from a performance perspective.

Its also known that the GPU client is not optimized for the HD 4x00 architecture since the client isn't even using the Local Data Share found on the RV770 architecture, the client just treat the HD 4x00 series as a HD 3x00 card, not completely sure about it.

In MilyWay@Home, it was proven that the GPU client ran more than 100 faster on a HD 4850 compared to the CPU version.

http://www.brightsideofnews.co...s-in-milkywayhome.aspx

http://milkyway.cs.rpi.edu/mil...orum_thread.php?id=589

http://www.gpugrid.net/forum_thread.php?id=705

http://www.overclock.net/ati/5...10-1-release-more.html

Quite a lot of links, seems that the GPGPU performance varies greatly at the architecture level depending of the type of work, MilkyWay@Home needs Double Precision, hence the excellent performance of the RV7x0 architecture, I couldn't find performance data of the client on nVidia client, but since its pretty much 4 times slower on Double Precision, I wouldn't be surprised why it hasn't been released yet, good move that nVidia did with the Fermi architecture, improving the Double Precision performance.

Wreckage · Oct 14, 2009

Originally posted by: evolucion8

In MilyWay@Home, it was proven that the GPU client ran more than 100 faster on a HD 4850 compared to the CPU version.

From the first link...

UPDATE #1, March 25th, 2009 10:26AM CET: In the article, we have used the base of 100 for performance basis, and we weren't clear about it. In November of 2008, Milkyway@Home code was 100x slower than it is now after Andreas "Gipsel" Przystawik did a brilliant set of code optimizations. This was efficiency test of Gipsel's code, not the original Milkyway@Home code.

Vertibird · Oct 14, 2009

Wreckage,

Why doesn't Nvidia release a die shrink of GTX 285 or GTS 250? Wouldn't they save considerable money by doing this (instead of making GPUs on the expensive 55nm process)

alyarb · Oct 14, 2009

i'm just coming in to say i seriously laughed out loud (i'm by myself) when I read wreckage's response to the OP. he didn't respond to a single question the OP presented.

just GTX 295 is still the fastest. check out batman with physx. Radeon 4000 was always behind, and they are outsold 2:1.

heh, thank you wreckage. everyone appreciates it.

bryanW1995 · Oct 14, 2009

Originally posted by: Wreckage

Originally posted by: v8envy

I'm assuming you are talking about anti-aliasing, while I am talking about PhysX as it relates to GPGPU and how that can apply to gaming.

Click to expand...

I doubt it.

Look at Folding@home for example. NVIDIA out performs ATI by a large margin.

Video encoding is another example. ATI still relies heavily on the CPU.

Like I said it was a good move by NVIDIA to not only provide the fastest gaming but also the fastest and most utilized applications.

At least rollo was sometimes interesting...

Wreckage · Oct 14, 2009

Originally posted by: alyarb
i'm just coming in to say i seriously laughed out loud (i'm by myself) when I read wreckage's response to the OP. he didn't respond to a single question the OP presented.

just GTX 295 is still the fastest. check out batman with physx. Radeon 4000 was always behind, and they are outsold 2:1.

heh, thank you wreckage. everyone appreciates it.

I answered both his questions. Why not address that instead of chasing after me?

My statements were in direct response to his summary. I think he and the other "zoners" just wanted a one sided discussion.

"AMD GPU's are smaller than Nvidia GPU's but performance is similar.
Are Nvidia GPU's really more aimed towards GPGPU?

"

Both statements are false as any gaming benchmark will show you.

Anyone care to revisit an old discussion?

Lifer

Lifer

Lifer

Banned

Platinum Member

Lifer

Platinum Member

Lifer

Banned

Platinum Member

Platinum Member

Banned

Member

Banned

Lifer

Diamond Member

Platinum Member

Lifer

Elite Member

Platinum Member

Banned

Member

Platinum Member

Lifer

Banned