"Inevitable Bleak Outcome for nVidia's Cuda + Physx Strategy"

Nemesis 1 · May 21, 2009

Well the visual pic I just got from that made me . LOL.

SirPauly · May 21, 2009

Originally posted by: Nemesis 1
Well the visual pic I just got from that made me . LOL.

Especially with the effects GPU Physics may enhance with a lot of dynamic movement -- maybe nVidia will create an APEX moon-ing module, hehe!

Scali · May 22, 2009

Originally posted by: BFG10K
Your recollection is wrong. Also the fact that you ignored the two examples of MSAA I gave you and instead discussed Quincunx tells me, to quote you, ?that you don?t really understand? the significance of those modes. Also to quote yourself: ?you don?t fool me?.

Oh please.
I never denied that nVidia called this technique multisampling.
I was just saying that it is NOT the multisampling that is standardized in OpenGL and Direct3D. Obviously I was talking about the standardized multisampling algorithm, as is everyone else, because that is the only algorithm actually supported by the major APIs.
I even gave a direct link to a Wikipedia page explaining multisampling as it is defined by the GL_ARB_multisample OpenGL extension. This is the same multisampling as Direct3D uses. And this is exactly the type of multisampling that the DX10.1 multisample readback feature is using.

You can't argue that GeForce3/4 perform the same algorithm as R300, because that simply isn't true. That much is clear from the links I posted.
You also can't argue that the GeForce3/4 performs multisampling in the way OpenGL and Direct3D standardized it, because that also isn't true.
So what exactly do you want to argue? That nVidia used the term 'multisampling' in their presskit? I never denied that. So I don't need to retract anything.

If anything, it is nVidia spreading the misinformation.
I get the same feeling as with that zbuffer/rasterization stuff. It's useless to argue, because nobody knows how it works anyway. Why don't you go ask the people at Beyond3D or something. Get a second opinion. I'm sure they'll tell you the same that I did, because it's just a simple fact to people who understand AA algorithms.

Scali · May 22, 2009

Originally posted by: BFG10K
Hmm, two samples?as in 2xRGMS, which is exactly what I said earlier.

So you have two colour samples and 2 depth samples. Sounds like SSAA to me.
They're not supersampling the zbuffer. The 5-sample filter then is purely on the colourbuffer.
How does that fit with:
"The specification dictates that the renderer evaluate one color, stencil, etc. value per pixel, and only "truly" supersample the depth value."

You could almost say that they are doing the OPPOSITE. They take more colour samples than depth samples, rather than taking more depth samples than colour samples.

Originally posted by: BFG10K
But hey, I guess nVidia, Tom, Anand (et el) are all wrong when they say the card supports MSAA.

Technically, no.
However, since R300 and the standardization of multisampling in OpenGL and Direct3D, nobody uses nVidia's technique anymore, and when talking about 'multisampling' they always refer to the standardized technique. Which is obviously what I was referring to aswell.
The confusion is because GeForce3/4 came out in a time when this standard was not set yet. Hence talk of multisampling in those days could be a bit confusing.
These days we have a good standard.
The Wikipedia page I linked, also explained that:
"In graphics literature in general, "multisampling" refers to any special case of supersampling where some components of the final image are not fully supersampled. For example, a real-world multisampling implementation may also supersample stencil values."

The GeForce3/4 AA methods are special cases of supersampling, no doubt about that. But not the specific optimization of the standard.

So my final offer was and still is:
"nVidia only support their own special case of supersampling, not multisampling as it is standardized in OpenGL and Direct3D and known to everyone today".
But you refused to accept that, and insist that I retract everything. I refuse to retract what I said, because what I said is a simple fact.
Besides, the point was that ATi's algorithm was the first to actually make it EFFICIENT, and people actually started to USE AA (and AF) in games.
You don't have to take my word for THAT either....
http://en.wikipedia.org/wiki/Radeon_9700
"Radeon 9700's advanced architecture was very efficient and, of course, more powerful compared to its older peers of 2002. Under normal conditions it beat the GeForce4 Ti 4600, the previous top-end card, by 15?20%. However, when anti-aliasing (AA) and/or anisotropic filtering (AF) were enabled it would beat the Ti 4600 by anywhere from 40?100%. At the time, this was quite astonishing, and resulted in the widespread acceptance of AA and AF as critical, truly usable features."

In fact, they thought this was so important, they EVEN listed it on the GeForce4 page:
http://en.wikipedia.org/wiki/GeForce4
"ATI's resulting Radeon 9700 Pro defeated the Ti 4600 by 15?20% in normal conditions. However, when anti-aliasing (AA) and/or anisotropic filtering (AF) were enabled, the 9700 would beat the Ti 4600 by anywhere from 40?100%."

BenSkywalker · May 22, 2009

Well if nV is going to be trouble, they're just going to have to buy them out then, won't they?

This may have been possible not that long ago, it isn't anymore. With their being found guilty of anti trust violations towards AMD they will not be allowed the purchase AMD's second largest competitor.

Uhh, zbuffering is done per pixel, not per triangle.
Doesn't sound like you know how a z-buffered rasterizer works.

WTF? We are discussing visibility checks, you going to render everything then do a visibility check to see what to render? That doesn't make any sense. Visibility determination on geometry is the issue with cache and possible bandwidth on deferred renderers, this hasn't changed.

They didn't bother to describe AA specifically because it's trivial, as I said, just an extra iteration to refine the grid you're rasterizing.

On the software side it is trivial, on the hardware side dealing with the increased amount of data that needs to be handled for a deferred render device is not.

You seem to sling some vague terms and statements around, but I think you fail to grasp the basics of rasterizing, such as supersampling/multisampling and zbuffering (or the difference between a multicore CPU and GPGPU programming model). You're not fooling me.

Heh, this is particularly amusing given your decission to expose yourself a bit later on

As far as I recall, GeForce3/4 only used that Quincunx thing, which nVidia may have called multisampling, but is not the same algorithm as the one used by ATi in the R300 and all later GPUs. THAT is the algo that is now known as multisampling in the Direct3D API. Not the one nVidia used.

NV2x cores most certainly were multi sampling parts, the fact that you would try to debate this point is rather shocking honestly. You can easily see it for yourself doing a straight stencil test to compare performance metrics if you are so inclined. This is trivial and really quite basic.

I see you link such great sites as Wikki to prove your point, how about using Dave Baumann as a reference?

Link. NV2X parts were multisample parts that also offered Quincunx. This is obscenely elementary.

Besides, if you're going to be arrogant and smug like the way Ben chose to argue, you deserve to be called out when you are talking nonsense.

Really now, we should use final render data to determine visibility of a scene prior to rendering and then we have your extensive knowledge of 3D hardware on display. Heh, you are a different sort.

"ATI's resulting Radeon 9700 Pro defeated the Ti 4600 by 15?20% in normal conditions. However, when anti-aliasing (AA) and/or anisotropic filtering (AF) were enabled, the 9700 would beat the Ti 4600 by anywhere from 40?100%."

BFG's blog idea is sounding like a good idea for you man. How could we possibly explain a nearly 100% performance increase when aditional bandwidth is the limiting performance factor comparing the R300 and nV2x? R9700Pro- 19.8GB/sec Ti4600-10.4GB/Sec- Wow, it is almost a direct linear correlation. What is even more amusing is that they had almost identical GTexel throughput. Funny that, the only thing that would properly explain a huge rift in performance is if they were bandwidth limited. Kind of like they were both doing MSAA, which happens to be what Dave Baumann, an ATi employee currently BTW, said they did.

Don't assume people know less about something then you do.

Scali · May 22, 2009

Originally posted by: BenSkywalker
WTF? We are discussing visibility checks, you going to render everything then do a visibility check to see what to render? That doesn't make any sense. Visibility determination on geometry is the issue with cache and possible bandwidth on deferred renderers, this hasn't changed.

Z-buffering IS the visibility check:
http://en.wikipedia.org/wiki/Z_buffer
You calculate the z-value of each pixel to see if that pixel is visible. If not, shading can be skipped. Makes perfect sense, all modern 3d hardware works that way.
Ofcourse there is also hidden surface removal through backface culling, but since that is done before rasterizing, I don't see how that part is relevant here.

Also, I already pointed out that Larrabee is NOT a deferred ('infinite planes') renderer.
It's a TBR (Tile-based renderer) not a TBDR (Tile-based deferred renderer).
Or at least, the implementation of its Direct3D and OpenGL renderers aren't deferred.

Or how exactly do you think it works? I'd like to hear you explain it so I know what you are talking about. What exactly ARE the 'visibility checks' that you think Larrabee (or any other hardware for that matter) does?

Originally posted by: BenSkywalker
On the software side it is trivial, on the hardware side dealing with the increased amount of data that needs to be handled for a deferred render device is not.

What are you getting at here, anyway?
Larrabee *is* a software renderer. There is no specific hardware involved in AA as far as we know.

Again, it's NOT a deferred renderer.

Originally posted by: BenSkywalkerReally now, we should use final render data to determine visibility of a scene prior to rendering and then we have your extensive knowledge of 3D hardware on display. Heh, you are a different sort.

Wait, you REALLY don't understand z-buffering, and the fact that ALL modern hardware uses it?
What I just told you is NEW to you? OMG!

Thing is, you didn't exactly present any alternatives either. You're claiming I'm wrong, yet you don't point out what is wrong, let alone explain the right way.
This makes your theory a whole lot less plausible than mine, since I actually linked to an explanation of what zbuffering does, which supports everything I said.
In your case you're just implying that there is some kind of magic algorithm, which you fail to even name, let alone explain how it works, and why it would be used instead of zbuffering.

Originally posted by: BenSkywalkerDon't assume people know less about something then you do.

I didn't assume, you made it painfully obvious.

As for the MSAA, I never denied that nVidia calls their solution multisampling, or even that generally speaking that term could be used in that sense.
That explains ALL the arguments people have given so far, and have nothing to do with the point I made, which was of an algorithmic nature, not about what name they put on the box. Why don't we just drop the subject because we're only going round in circles.
Dave Baumann actually points out how the 'multisampling' is handled by nVidia:
"Because multisampling only takes one texture sample for all the subsamples, under GeForce3's 2x & Qunincunx schemes the texture sample would be exactly suitable for sample point 1 (being in the centre of the sample) but not very suitable for sample point 2 (as this is at the very edge of the pixel, bordering the next)."

They're saving on texture fetches, but NOT on shading in general. So yes, the texture sampling resolution is lower than that of other parts of the sampling. In that sense it's 'multisampling'. But the key to multisampling as we know it in OpenGL/Direct3D is that ONLY the depthbuffer is sampled at a higher resolution. This is what allows for huge bandwidth savings and framebuffer compression, which make the MSAA of the R300 and later GPUs so efficient (and explains why it doesn't work on bumpmapping, unlike nVidia's solution).

BenSkywalker · May 22, 2009

Z-buffering IS the visibility check:
http://en.wikipedia.org/wiki/Z_buffer
You calculate the z-value of each pixel to see if that pixel is visible. If not, shading can be skipped. Makes perfect sense, all modern 3d hardware works that way.
Ofcourse there is also hidden surface removal through backface culling, but since that is done before rasterizing, I don't see how that part is relevant here.

Why do you quote Wikki man? Why not try Foley&VanDam, and obviously Zbuffering is the viz check for a traditional rasterizer.

Also, I already pointed out that Larrabee is NOT a deferred ('infinite planes') renderer.
It's a TBR (Tile-based renderer) not a TBDR (Tile-based deferred renderer).

LRB itself isn't any particular sort of renderer, the rendering technique Abrash explains looks like deferred to me. Tiling your output data to memory is standard for all GPU makers, doesn't make it a 'tile based renderer'.

What exactly ARE the 'visibility checks' that you think Larrabee (or any other hardware for that matter) does?

Intel's clearly stated goal is LRB is supposed to be a RTR, so their design goal would be that the visibility check is a 'Ray'. How it handles rasterization still hasn't been nailed down, looking over Abrash's paper it seemed like he was shooting for deferred rendering as LRB is going to have significantly less raw fill then a rasterizer. Abrash's approach needn't be the only one of course, but the limitations of the chip seem to make it clear you aren't going to use brute force if you hope to approach even the mid range.

Wait, you REALLY don't understand z-buffering, and the fact that ALL modern hardware uses it?

RTR won't use anything like what we currently use for ZBuffering.

As for the MSAA, I never denied that nVidia calls their solution multisampling, or even that generally speaking that term could be used in that sense.

NV2x's AA is pure MSAA. Quincunx is not- it is a hybrid mode. their MSAA uses higher fidelity sampling of Z data, that is it. It is entirely a pure MSAA mode, the only 'fill' you can make it use at any setting is pure stencil which in the case of nV is also used for Z. There is no additional color value sampled, it is entirely a 'pure' MSAA mode in every sense no matter what standard you use(unless you want to go way back to accumulation buffer style super sampling- those would be on Irix machines but you must be quite familiar with those, heh).

They're saving on texture fetches, but NOT on shading in general. So yes, the texture sampling resolution is lower than that of other parts of the sampling. In that sense it's 'multisampling'. But the key to multisampling as we know it in OpenGL/Direct3D is that ONLY the depthbuffer is sampled at a higher resolution. This is what allows for huge bandwidth savings and framebuffer compression, which make the MSAA of the R300 and later GPUs so efficient (and explains why it doesn't work on bumpmapping, unlike nVidia's solution).

Based on YOUR links the MSAA performance difference between the R300 and NV2X almost exactly matches the bandwidth difference. NV's MSAA also does not work on bumpmapping at all. I have a GF4 here running in an old rig btw, not like I'm trying to guess.

Scali · May 22, 2009

Originally posted by: BenSkywalker
Why do you quote Wikki man? Why not try Foley&VanDam

Is there an online version I can link to then?
Besides, Foley & Van Dam don't contradict Wiki.

Originally posted by: BenSkywalker
LRB itself isn't any particular sort of renderer, the rendering technique Abrash explains looks like deferred to me.

What makes you say that?

Originally posted by: BenSkywalker
Intel's clearly stated goal is LRB is supposed to be a RTR, so their design goal would be that the visibility check is a 'Ray'.

That's a completely different algorithm, not compatible with current Direct3D or OpenGL standards. Which is why Abrash describes a rasterizer, not a raytracer.
Ironically enough, raytracing is actually a rather slow way of determining visibility.
Software such as 3dsmax uses a z-buffered rasterizer for all the direct visibility, and uses rays only for reflections, lighting, shadows and such.

Originally posted by: BenSkywalker
RTR won't use anything like what we currently use for ZBuffering.

Obviously not, but we're now talking about the Direct3D/OpenGL rasterizer implementation on Larrabee, not any kind of future raytracing APIs.

Originally posted by: BenSkywalker
Based on YOUR links the MSAA performance difference between the R300 and NV2X almost exactly matches the bandwidth difference.

Now if we were to look at the Radeon 9500 Pro and 9600 Pro, they actually have slightly less bandwidth than the GeForce4 Ti4600.
Yet they STILL outperform the GeForce4 Ti4600 in 4xAA:
http://www.anandtech.com/showdoc.aspx?i=1812&p=12
How is that possible?

BFG10K · May 22, 2009

Originally posted by: Scali

I never denied that nVidia called this technique multisampling.

You claimed the GF4 only had super-sampling when this patently false.

You claimed the GF3 didn?t have MSAA when this is patently false.

You also provided ?evidence? by showing links to Quincunx when this mode actually has nothing to do with the pure MSAA modes I was discussing.

I was just saying that it is NOT the multisampling that is standardized in OpenGL and Direct3D.

Yes it is. It takes multiple color/z/stencil samples while only taking one shader & texture sample. That?s multi-sampling. The sample positions were changed on the NV25 and again on the NV40, but it?s still multi-sampling.

You can't argue that GeForce3/4 perform the same algorithm as R300, because that simply isn't true.

I didn?t say they had the same algorithm, I said the GF3 was the first board in consumer space to offer MSAA in the form of 2xRGMS and 4xOGMS. Don?t put words in my mouth and state things I never said.

That much is clear from the links I posted.

No it isn?t. You posted links to Quincunx which has nothing to do with the 2xRGMS/4xOGMS modes I mentioned earlier.

Again, I still don?t think you understand that these modes are pure MSAA modes, and that Quincunx is simply 2xRGMS combined with a post-filter. In fact Quincunx is akin to ATi?s narrow & wide tent modes which use MSAA as the base but also sample pixels outside of the current one.

Your argument is like stating ATi doesn?t have MSAA, and then linking to documents about CFAA to ?prove? it.

You also can't argue that the GeForce3/4 performs multisampling in the way OpenGL and Direct3D standardized it, because that also isn't true.

It is true; that you can?t understand that Quincunx isn?t part of this discussion isn?t my problem.

Why don't you go ask the people at Beyond3D or something.

We did; Ben provided you with a link. Here?s another, complete with IQ comparisons:

http://www.nvnews.net/previews...ce3/antialiasing.shtml

I?ll walk you down the page with the tested AA modes on the GF3: 2xAA, 4xAA & Quincunx. The first two are pure MSAA modes.

Using your reasoning, you?d have to claim the GF3?s tested modes of 2xAA and 4xAA are all either super-sampling or Quincunx, which would then be trivial to disprove based on the benchmark figures and the differences in IQ.

And what a surprise, yet another site that backs my claims as MSAA being the base of Quincunx:

Since the GeForce3 uses a multisampling technique for antialiasing, the Quincunx option is not present for the GeForce and GeForce2 which perform antialiasing via supersampling.

Stop talking about Quincunx and linking to documents about it because it's an irrelevant tangent that occludes the core issue.

Scali · May 22, 2009

Originally posted by: BFG10K
Yes it is. It takes multiple color/z/stencil samples while only taking one shader & texture sample.

How can you take multiple colour samples when you only take one shader sample? The shader output IS the colour sample.

Originally posted by: BFG10K
Again, I still don?t think you understand that these modes are pure MSAA modes

I think you don't understand that this depends on your definition of what a 'pure MSAA' mode is.

Originally posted by: BFG10K
Using your reasoning, you?d have to claim the GF3?s tested modes of 2xAA and 4xAA are all either super-sampling or Quincunx, which would then be trivial to disprove based on the benchmark figures and the differences in IQ.

Uhh, what?
I fail to see the logic.
Why would ALL modes have to be either supersampling or Quincunx?
And are you now saying that regardless of whether an AA mode is 2x or 4x, if it is supersampling it will always have the same benchmark figures and no differences in IQ?

Originally posted by: BFG10K
Stop talking about Quincunx and linking to documents about it because it's an irrelevant tangent that occludes the core issue.

What exactly IS the core issue? Because you've completely lost me. You're arguing that Quincunx is based on MSAA?
Excuse me while I slap you in the face with an earlier post of mine:
"As far as I recall, GeForce3/4 only used that Quincunx thing, which nVidia may have called multisampling, but is not the same algorithm as the one used by ATi in the R300 and all later GPUs."
And then again this one:
""In graphics literature in general, "multisampling" refers to any special case of supersampling where some components of the final image are not fully supersampled. For example, a real-world multisampling implementation may also supersample stencil values."

The GeForce3/4 AA methods are special cases of supersampling, no doubt about that."

So what is your point, aside from continuing to argue things that I have LONG conceded to?

BFG10K · May 22, 2009

Originally posted by: Scali

So you have two colour samples and 2 depth samples. Sounds like SSAA to me.

That?s like saying CSAA is MSAA because both have coverage samples. :roll:

You?re aware that MSAA is simply SSAA with the texture and shader samples decoupled from the base pattern, right?

They're not supersampling the zbuffer. The 5-sample filter then is purely on the colourbuffer.

How does that fit with:
"The specification dictates that the renderer evaluate one color, stencil, etc. value per pixel, and only "truly" supersample the depth value."

It doesn?t, because we?re not talking about Quincunx. You?re the only one using that irrelevant strawman, so stop it.

However, since R300 and the standardization of multisampling in OpenGL and Direct3D, nobody uses nVidia's technique anymore, and when talking about 'multisampling' they always refer to the standardized technique.

That?s right, except for the fact that everybody uses nVidia?s technique. Furthermore it?s not really ?nVidia?s? technique given the MSAA in use today is taking exactly the same sample types as it was on the GF3, just more of them, and with different grids.

The confusion is because GeForce3/4 came out in a time when this standard was not set yet. Hence talk of multisampling in those days could be a bit confusing.

There?s no confusion here, other than you thinking Quincunx is somehow the same thing as the 2xRGMS/4xOGMS.

"In graphics literature in general, "multisampling" refers to any special case of supersampling where some components of the final image are not fully supersampled. For example, a real-world multisampling implementation may also supersample stencil values.?

Again, using that reasoning CSAA is a special case of SSAA. :roll:

Not only that, but it?s completely irrelevant to what is being discussed.

The GeForce3/4 AA methods are special cases of supersampling, no doubt about that.

Uh, no; Quincunx is neither MSAA or SSAA; it?s MSAA + post-filter. The GF3?s 2xAA & 4xAA modes, OTOH, are pure MSAA.

"nVidia only support their own special case of supersampling, not multisampling as it is standardized in OpenGL and Direct3D and known to everyone today".

This is patently false, and has been proven repeatedly to be so. All your irrelevant rhetoric and links to Wikipedia won?t change that.

Besides, the point was that ATi's algorithm was the first to actually make it EFFICIENT, and people actually started to USE AA (and AF) in games.

This is more semantic games, rhetoric, and goal-post shifting on your part. This is the typical behavior your arguments exhibit when you?re cornered, which is why it seems like you ?win?. In reality, most people simply can?t be bothered dealing with that crap anymore.

BFG10K · May 22, 2009

Originally posted by: Scali

How can you take multiple colour samples when you only take one shader sample? The shader output IS the colour sample.

At this point it?s clear to me that you have absolutely no idea what MSAA is, nor how it works.

I think you don't understand that this depends on your definition of what a 'pure MSAA' mode is.

There?s only one definition, but that?s not the one you?re using. You?re also the only one using this magic definition.

Why would ALL modes have to be either supersampling or Quincunx?

Because you?re claiming they?re not MSAA. What else was there at the time?

And are you now saying that regardless of whether an AA mode is 2x or 4x, if it is supersampling it will always have the same benchmark figures and no differences in IQ?

No, I?m saying the different benchmark scores show they can?t all be Quincunx (since it?s always five samples), and the differences in IQ compared to SSAA used on the GF2 in comparison show they can?t be SSAA.

What exactly IS the core issue? Because you've completely lost me.

The core issue is that the GF3 was the first part to offer MSAA, not the R300 like you claimed. Furthermore it was MSAA, not some ?different? definition you claim.

You're arguing that Quincunx is based on MSAA?
Excuse me while I slap you in the face with an earlier post of mine:
"As far as I recall, GeForce3/4 only used that Quincunx thing, which nVidia may have called multisampling, but is not the same algorithm as the one used by ATi in the R300 and all later GPUs."

Can you answer a simple question? Do you understand the concept of the 2x and 4x modes existing on the GF3, and they?re neither super-sampling or Quincunx?

So what is your point, aside from continuing to argue things that I have LONG conceded to?

You haven?t conceded anything; you?re simply continuing to play semantic and rhetorical games.

Scali · May 22, 2009

Originally posted by: BFG10K
At this point it?s clear to me that you have absolutely no idea what MSAA is, nor how it works.

Really? Why don't you explain to me where I'm wrong then?
How can you take a single shader sample and still end up with multiple colour samples?

Originally posted by: BFG10K
There?s only one definition, but that?s not the one you?re using. You?re also the only one using this magic definition.

The one used by OpenGL and Direct3D?
Here's a very nice explanation by Microsoft:
http://msdn.microsoft.com/en-u...S.85).aspx#Multisample
"To improve performance, per-pixel calculations are performed once for each covered pixel, by sharing shader outputs across covered sub-pixels."
So the pixelshader is run once, in the pixel center, for all subpixels of that pixel (so the pixelshader is sampled roughly in the center of all the z-values you're supersampling within that pixel. There's no direct link between z-values and colour values for the subpixels, they are sampled at different positions AND different frequencies).

Now you say there are multiple colour samples taken in nVidia's approach. But colour samples are part of the 'per-pixel calculations' done by the shader, correct?
So I'm having trouble fitting together what you say and what OpenGL/Direct3D say. Can you help me out here?

Originally posted by: BFG10K
Because you?re claiming they?re not MSAA. What else was there at the time?

I'm not claiming they're not MSAA in the broader definition.
I'm claiming that they're not MSAA in OpenGL/Direct3D's definition.

Originally posted by: BFG10K
No, I?m saying the different benchmark scores show they can?t all be Quincunx (since it?s always five samples), and the differences in IQ compared to SSAA used on the GF2 in comparison show they can?t be SSAA.

Differences in IQ between different GPUs could have many different reasons. Not just the AA method used. It could be the same AA method with a different grid, or the differences could be caused by different precision in the pipeline, different texture filtering techniques, or many other things.
You can't just draw the conclusion that they can't be SSAA simply because they don't look exactly the same.
Same thing goes for the benchmarking... differences in performance can be caused by many things, not necessarily related to AA directly.

Originally posted by: BFG10K
Can you answer a simple question? Do you understand the concept of the 2x and 4x modes existing on the GF3, and they?re neither super-sampling or Quincunx?

That's what I said, they're a special case of supersampling, and in general you could call such a case multisampling. As I say, I already conceded to that many posts ago.

BFG10K · May 22, 2009

Originally posted by: Scali

Really? Why don't you explain to me where I'm wrong then?
How can you take a single shader sample and still end up with multiple colour samples?

All of the samples are the same color, but they?re stored multiple times, once for each sub-pixel.

The one used by OpenGL and Direct3D?

I'm not claiming they're not MSAA in the broader definition.
I'm claiming that they're not MSAA in OpenGL/Direct3D's definition.

Alright, let's try this another way. Please provide evidence that the GF3?s 2x/4x modes are different to the R300?s in the context of the data tied to each sample.

Not just the AA method used. It could be the same AA method with a different grid, or the differences could be caused by different precision in the pipeline, different texture filtering techniques, or many other things.

Then why don?t you tell us what the differences are caused by? No theory, state them.

I?ve stated they?re caused by the fact 2xAA/4xAA/2xQ are fundamentally different to SSAA, and I?ve provide repeated documentation to back this fact. Now you do the same. And no, posting some random spec which plays absolutely no part in this discussion is not documentation.

You can't just draw the conclusion that they can't be SSAA simply because they don't look exactly the same.

You certainly can by checking the areas where SSAA affects the image but MSAA doesn?t. It?s generally trivial to detect when MSAA is action when compared to SSAA. The author even did it for you.

That's what I said, they're a special case of supersampling, and in general you could call such a case multisampling.

By that same reasoning the R300?s multi-sampling is simply a ?special case? of SSAA. Likewise the GTX2xx series along with the HD4xxx series.

You still haven?t retracted your original claims of the GF4 only offering SSAA and the R300 being the first to offer MSAA. All I see now is more semantic games and back pedaling ?it?s a special case of SSAA? and ?it?s not in the spec?. Again by that reasoning, CSAA is a special case of MSAA. Likewise I can claim the original Radeon supported CSAA and AAA given both are a ?special case? of SSAA, again using your reasoning.

BenSkywalker · May 22, 2009

What makes you say that?

For chunking, rasterization consists of two steps: The first identifies which tiles a triangle touches, and the second rasterizes the triangle within each tile. So it's a two-stage process and I'm going to discuss the two stages separately.

Abrash says as much, although it isn't quite the same as PowerVR. Another quote-

If all three edges are negative at their respective trivial accept corners, then the whole tile is inside the triangle, and no further rasterization tests are needed -- and this is what I meant earlier when I said the rasterizer takes advantage of CPU smarts by not-rasterizing whenever possible. The tile-assignment code can just store a draw-whole-tile command in the bin. Then the bin rendering code can simply do the equivalent of two nested loops around the shaders, resulting in a full-screen triangle rasterization speed of approximately infinity -- one of my favorite performance numbers!

Again, from Abrash(emphasis mine).

Ironically enough, raytracing is actually a rather slow way of determining visibility.

Rather slow is a nice way of saying it, horrific, catastrophic, lousy are others.

Software such as 3dsmax uses a z-buffered rasterizer for all the direct visibility, and uses rays only for reflections, lighting, shadows and such.

For diffuse lighting ray tracing sucks. Radiosity throttles it and although I haven't used 3DS in a while I would assume they still support it or something comparable at least.

Now if we were to look at the Radeon 9500 Pro and 9600 Pro, they actually have slightly less bandwidth than the GeForce4 Ti4600.
Yet they STILL outperform the GeForce4 Ti4600 in 4xAA:
http://www.anandtech.com/showdoc.aspx?i=1812&p=12
How is that possible?

You didn't read the B3D link, did you? GeForce4 added hardware that improved performance on the acquisition of samples along with increasing the quality of the positions of the samples taken- for 2x AA. 4x AA still used OG sampling patterns that were remnants of the NV1x cores and it was still quite a bit slower at handling it. In the link you provided the GF4 bests the Radeons using 2x AA, it wouldn't stand a chance at doing that using any sort of SSAA, conditional or not(it wouldn't have the pixel throughput).

Scali · May 22, 2009

Originally posted by: BFG10K
All of the samples are the same color, but they?re stored multiple times, once for each sub-pixel.

Okay, so technically it's one colour sample, replicated.
Now what bothers me is that I can't find any info regarding how nVidia does this.
According to this:
http://msdn.microsoft.com/en-u...S.85).aspx#Multisample
You run the pixelshader once in the pixel center.

Now getting back to the link at Beyond3D:
http://www.beyond3d.com/content/reviews/20/3

Aside from the part that this discusses GeForce4's improvements over GeForce3, so it probably doesn't cover exactly how GeForce3 does it...
All they say is that they take one TEXTURE sample for every subsample.
Now why would they specifically say texture-sample when it's a whole pixelshader that would only be run once for all subsamples? So I'm inclined to take this literally.
What I think nVidia did is this:
They start off just like regular supersampling, so they run a pixelshader for every subpixel. But they built in a 'shortcut' so that the texturing hardware doesn't do 4 separate fetches, but instead they sample one texel at the pixel center (which seems to be an improvement over GF3, which just sampled for the position of the first subsample, something that cannot be explained if you're only running a pixelshader in the center position... because why would it not have the center position already?), and forward the same result to all pixelshaders.

That is what the text tells me, when I take it literally.
It would also explain why nVidia's GeForce4 Ti4600 is faster than a Radeon 9500/9600Pro when no AA is performed, but slower when 4xAA is performed.
Namely, the GeForce4 Ti is doing more work per pixel, where the Radeon is saving processing power and bandwidth because it runs the entire shader only once per pixel, not once per sample.

Now, Beyond3D could be wrong, but their explanation is the most technical and in-depth I have seen on the net, so it's all I can go on.

Originally posted by: BFG10K
Alright, let's try this another way. Please provide evidence that the GF3?s 2x/4x modes are different to the R300?s in the context of the data tied to each sample.

I think the burden of proof is on you, actually. I don't have better info on the GF3/4 AA modes than what Beyond3D says.
I think we all agree that the R300 does exactly what the Microsoft Rasterization Rules specify.
So if you can dig up any info that proves that not only the textures, but also the pixelshaders on GF3/4 are run only once for the pixel center, then you can prove that it does MSAA according to the OpenGL/Direct3D rules. Then I was indeed wrong, then again, so were sites like Beyond3D, on which I had to base my understanding of it.

Originally posted by: BFG10K
By that same reasoning the R300?s multi-sampling is simply a ?special case? of SSAA.

Yes, strictly speaking that is true. The Wikipedia page I linked to makes it very clear that multisampling by defnintion means "a special case of supersampling".
However, the R300 multi-sampling is ALSO the same "special case" as in OpenGL and Direct3D standards, as are all other modern GPUs.
In other words, multisampling as we know it today.

Originally posted by: BFG10K
You still haven?t retracted your original claims of the GF4 only offering SSAA and the R300 being the first to offer MSAA.

In a way I did, because I rephrased my claim, acknowledging that the original claim was inaccurate, and didn't quite reflect what I meant to say.

Scali · May 22, 2009

Originally posted by: BenSkywalker

For chunking, rasterization consists of two steps: The first identifies which tiles a triangle touches, and the second rasterizes the triangle within each tile. So it's a two-stage process and I'm going to discuss the two stages separately.

Click to expand...

Abrash says as much, although it isn't quite the same as PowerVR. Another quote-

If all three edges are negative at their respective trivial accept corners, then the whole tile is inside the triangle, and no further rasterization tests are needed -- and this is what I meant earlier when I said the rasterizer takes advantage of CPU smarts by not-rasterizing whenever possible. The tile-assignment code can just store a draw-whole-tile command in the bin. Then the bin rendering code can simply do the equivalent of two nested loops around the shaders, resulting in a full-screen triangle rasterization speed of approximately infinity -- one of my favorite performance numbers!

Click to expand...

Again, from Abrash(emphasis mine).

Seems to be a misunderstanding on your part.
He only talks about determining which pixels need to be rasterized within each tile.
That is simply the rasterization process itself.
There's just an optimization that if an entire tile is covered by a triangle, you know right away that all pixels have to be rasterized, so you don't have to do the iteration at the pixel level to generate the masks.
It doesn't try to solve the visibility in terms of depth complexity (how could you, you have to solve that at pixel level, not at triangle or tile level). That is still done with a zbuffer, handled inside the shaders.

Originally posted by: BenSkywalker
For diffuse lighting ray tracing sucks. Radiosity throttles it and although I haven't used 3DS in a while I would assume they still support it or something comparable at least.

They're using a hybrid.
First they use the scanline renderer to rasterize the scene (basically determining visibility of the triangles/pixels). While doing that, they also store the surface normal for each pixel, and some other info.
Then a second pass is made where the surface normals are used for raytracing.
You could say it's a "screen space raytracer".

Originally posted by: BenSkywalker
You didn't read the B3D link, did you?

I did, it said it only sampled the textures at the pixel center. See my other post to BFG10K.

BFG10K · May 22, 2009

Originally posted by: Scali

Okay, so technically it's one colour sample, replicated.

So you needed my explanation to tell you that. Why do you continue to argue on what constitutes the definition of MSAA, yet you don?t even know how it works?

Now what bothers me is that I can't find any info regarding how nVidia does this.

Eh? nVidia?s sample data is the same as ATi?s sample data; that?s why it?s called MSAA.

Also the sampled data types are the same on the GF3 as they are on the GTX2xx/4xxx parts. Again, that?s why it?s called MSAA.

All they say is that they take one TEXTURE sample for every subsample.
Now why would they specifically say texture-sample when it's a whole pixelshader that would only be run once for all subsamples?

Because it is one texture sample and one shader sample, sampled from the pixel?s center. Obviously if there?s no shader running (common when that article was written) then it?ll just be whatever fixed multi-textured result you?ve got.

What I think nVidia did is this:
They start off just like regular supersampling, so they run a pixelshader for every subpixel. But they built in a 'shortcut' so that the texturing hardware doesn't do 4 separate fetches, but instead they sample one texel at the pixel center, and forward the same result to all pixelshaders.

So not only do you not know what MSAA is, you?re now making up theories on the spot and then claiming the burden of proof is on me to prove you wrong?

I honestly cannot believe I?m actually participating in this discussion.

I think the burden of proof is on you, actually.

LMFAO, nope.

There have been extensive links and commentary provided here thus-far from Ben and myself proving you wrong. If you want to contest all of that, the burden of proof is with you. If you can?t provide that proof then you need to retract your claims.

I don't have better info on the GF3/4 AA modes than what Beyond3D says

So in other words you can?t back what you?ve been saying all this time. In that case, you need to retract your claims, or at least state you don?t really know how it works.

I think we all agree that the R300 does exactly what the Microsoft Rasterization Rules specify.

Any part that does regular MSAA (like the GF3) follows those ?rules?, assuming they don?t exceed the part?s DX spec level of course.

So if you can dig up any info that proves that not only the textures, but also the pixelshaders on GF3/4 are run only once for the pixel center, then you can prove that it does MSAA according to the OpenGL/Direct3D rules.

Of course they are; that?s the definition of MSAA, namely the decoupling of both sample types from z/stencil/color.

Yes, strictly speaking that is true. The Wikipedia page I linked to makes it very clear that multisampling by defnintion means "a special case of supersampling".
However, the R300 multi-sampling is ALSO the same "special case" as in OpenGL and Direct3D standards, as are all other modern GPUs.
In other words, multisampling as we know it today.

The multi-sampling as we know it today was being done on the GF3 as far as sampled data types are concerned. No amount of semantic games will change that fact.

2xAA/4xAA on the GF3 was not SSAA; it was not Quincunx; it was MSAA. The sample types taken on the GF3 with those modes were exactly the same as any DX10 part running MSAA today.

In a way I did, because I rephrased my claim, acknowledging that the original claim was inaccurate, and didn't quite reflect what I meant to say.

?In a way? is not good enough. You were wrong then and you continue to be wrong now. You need to either retract your claims, or admit you don?t know enough about the topic to continue the discussion.

BFG10K · May 22, 2009

Originally posted by: Scali

Originally posted by: BFG10K
For diffuse lighting ray tracing sucks. Radiosity throttles it and although I haven't used 3DS in a while I would assume they still support it or something comparable at least.

Click to expand...

They're using a hybrid.
First they use the scanline renderer to rasterize the scene (basically determining visibility of the triangles/pixels). While doing that, they also store the surface normal for each pixel, and some other info.
Then a second pass is made where the surface normals are used for raytracing.
You could say it's a "screen space raytracer".

Originally posted by: BFG10K
You didn't read the B3D link, did you?

Click to expand...

I did, it said it only sampled the textures at the pixel center. See my other post to BFG10K.

I didn't post either of those quotes, so fix them please.

Scali · May 22, 2009

Originally posted by: BFG10K
So you needed my explanation to tell you that. Why do you continue to argue on what constitutes the definition of MSAA, yet you don?t even know how it works?

As I said, it sounds to me like nVidia just runs the pixelshader multiple times to get the colour values, when I read sites like Beyond3D.
When you said it sampled multiple colours, I thought that was what you meant aswell.

Originally posted by: BFG10K
Eh? nVidia?s sample data is the same as ATi?s sample data; that?s why it?s called MSAA.

Also the sampled data types are the same on the GF3 as they are on the GTX2xx/4xxx parts. Again, that?s why it?s called MSAA.

No, I need to find some info that confirms that it actually DOES run only one pixelshader and NOT just the texture fetches per pixel, and some info that explains why GF3, even if it runs only one pixelshader in the pixel center, doesn't sample its textures at the pixel center.

Originally posted by: BFG10K
Because it is one texture sample and one shader sample, sampled from the pixel?s center. Obviously if there?s no shader running (common when that article was written) then it?ll just be whatever fixed multi-textured result you?ve got.

I'm not convinced of this. As I say, there's still the problem of the GF3 not sampling its textures at the pixel center.
There must be a source somewhere stating what happens when using pixelshaders and AA, right? Unless what Beyond3D says is true.

Originally posted by: BFG10K
There have been extensive links and commentary provided here thus-far from Ben and myself proving you wrong.

No they didn't.
I fully agree with what Beyond3D says.
Bringing us back to why GF3 doesn't sample at the pixel center, and why there's no mention of what the pixelshaders do.
You're now asking me to contest something which you haven't backed up.
The burden of proof is on you. If you want to prove that pixelshaders do indeed only run once per pixel center, you'll have to produce some info.
As long as there is no info on what pixelshaders do, I will just go by what Beyond3D says: only textures are sampled once per pixel (and not even at the pixel center in the case of GeForce3).

If you want to contest all of that, the burden of proof is with you. If you can?t provide that proof then you need to retract your claims.

Originally posted by: BFG10K
So in other words you can?t back what you?ve been saying all this time. In that case, you need to retract your claims, or at least state you don?t really know how it works.

I've given my explanation of how it works, which is backed by what Beyond3D says.
It's now up to you to prove that it doesn't work that way, and Beyond3D is wrong.
I can't prove that the pixelshaders are run for every subsample because the info on the pixelshaders simply isn't available. This also means that you can't prove that it doesn't run for every subsample.

The theory is however plausible, because it explains the huge performance hit that GeForce4 gets with AA.
A Radeon 9600Pro, of which we KNOW that it doesn't run pixelshaders for every subsample, only gets a minor hit when going from 2xAA to 4xAA.
This makes sense because mainly your zbuffer gets more work. You don't need to shade that many more pixels.
Since the GeForce4 is faster without AA, I think it's safe to assume that the GeForce4 has better shader/zbuffer performance than Radeon per-pixel. The GeForce4 also has the bandwidth advantage.
Therefore the explanation why GeForce4 is slower in 4xAA seems to be in the fact that it does a lot more shading.

So if you can't produce any info that tells us how GeForce handles its pixelshaders with AA, how do you explain the fact that GeForce4 gets such a big performance hit from 4xAA, even though it has the bandwidth advantage, and also seems to be faster per pixel, judging from the performance without AA?

Originally posted by: BFG10K
Any part that does regular MSAA (like the GF3) follows those ?rules?, assuming they don?t exceed the part?s DX spec level of course.

Incorrect. Beyond3D specifically says that the GeForce3 does NOT sample its textures at the pixel center.

Originally posted by: BFG10K
Of course they are; that?s the definition of MSAA, namely the decoupling of both sample types from z/stencil/color.

Again, in the case of GeForce3, textures are NOT decoupled from z.

Originally posted by: BFG10K
The multi-sampling as we know it today was being done on the GF3 as far as sampled data types are concerned. No amount of semantic games will change that fact.

Yet again there is this texture sampling that's not being done in the right place on GF3...

BFG10K · May 22, 2009

Originally posted by: Scali

As I said, it sounds to me like nVidia just runs the pixelshader multiple times to get the colour values, when I read sites like Beyond3D.

Except they don?t, because it?s MSAA, which by definition doesn?t do this.

No, I need to find some info that confirms that it actually DOES run only one pixelshader and NOT just the texture fetches per pixel, and some info that explains why GF3, even if it runs only one pixelshader in the pixel center, doesn't sample its textures at the pixel center.

Pardon? What on Earth are you talking about? Did you even read the B3D the link?

Let me quote it for you:

Because multisampling only takes one texture sample for all the subsamples, under GeForce3's 2x & Qunincunx schemes the texture sample would be exactly suitable for sample point 1 (being in the centre of the sample) but not very suitable for sample point 2 (as this is at the very edge of the pixel, bordering the next).

See point ?1? in the GF3 picture Scali? It?s in the center. Even the commentary says so.

What part of this are you having trouble understanding? The definition of the word ?center? perhaps? Or something else?

Not to mention that those samples aren?t even texture samples, they?re geometry samples. How can there be two texture samples when the review clearly states there?s only one? Again, you simply have no idea what you?re talking about, and it?s simply comical how you continue this charade.

I'm not convinced of this.

Fortunately for us, reality doesn?t hinge on the fact on whether you?re convinced of anything.

As I say, there's still the problem of the GF3 not sampling its textures at the pixel center.

??texture sample would be exactly suitable for sample point 1 (being in the centre of the sample??

I fully agree with what Beyond3D says.

So you agree the texture is in the center them, just like they stated?

Bringing us back to why GF3 doesn't sample at the pixel center,

Except it does, because it does MSAA.

As long as there is no info on what pixelshaders do, I will just go by what Beyond3D says: only textures are sampled once per pixel (and not even at the pixel center in the case of GeForce3).

??texture sample would be exactly suitable for sample point 1 (being in the centre of the sample??

If you want to contest all of that, the burden of proof is with you. If you can?t provide that proof then you need to retract your claims.

Words?fail?me. Maybe I?ll just quote something you said to someone else as a response to this:

Originally posted by: Scali

It seems deliberate (I give you the benefit of the doubt that you're not really THAT thick. Sadly that means that I think you are trolling and being obnoxious on purpose).

Yep, that sums it up quite nicely.

I've given my explanation of how it works, which is backed by what Beyond3D says.

No you haven?t. You?ve made up utter bullshit that demonstrates an almost non-existant understanding of MSAA, along with the inability to grasp basic text and pictures.

This also means that you can't prove that it doesn't run for every subsample.

Actually you can give it?s trivial to demonstrate with any good AA tester app.

This makes sense because mainly your zbuffer gets more work. You don't need to shade that many more pixels.

It makes sense to whom? To the guy that states that something isn?t the center, when it is?

Incorrect. Beyond3D specifically says that the GeForce3 does NOT sample its textures at the pixel center.

??texture sample would be exactly suitable for sample point 1 (being in the centre of the sample??

Again, in the case of GeForce3, textures are NOT decoupled from z.

Yes they are given there?s only one texture sample, but multiple and unique z values. Again that?s what MSAA is.

Yet again there is this texture sampling that's not being done in the right place on GF3...

??texture sample would be exactly suitable for sample point 1 (being in the centre of the sample??

SirPauly · May 22, 2009

This is all fantastic and all and around we go in circles but the discussion to me is nVidia's decisions as they are constituted right now right or wrong; and is their future for Cuda and PhysX bleak indeed?

Scali · May 22, 2009

Originally posted by: BFG10K
Except they don?t, because it?s MSAA, which by definition doesn?t do this.

I thought we had agreed that multisample could mean any kind of special case supersampling, so it's not really a definition as such, outside of the context of rasterization rules of a major API.

Originally posted by: BFG10K
Pardon? What on Earth are you talking about? Did you even read the B3D the link?

Let me quote it for you:

Because multisampling only takes one texture sample for all the subsamples, under GeForce3's 2x & Qunincunx schemes the texture sample would be exactly suitable for sample point 1 (being in the centre of the sample) but not very suitable for sample point 2 (as this is at the very edge of the pixel, bordering the next).

Click to expand...

See point ?1? in the GF3 picture Scali? It?s in the center. Even the commentary says so.

What part of this are you having trouble understanding? The definition of the word ?center? perhaps? Or something else?

Only because they fudged the pattern so it still works. Wasn't the key to MSAA that colour and z are decoupled? They aren't here, that's why you can only make it work with these fudged patterns, where you always need to sample in the pixel center.

The problem is with the Quincunx diagram. Samples 1 and 2 would have the correct texel, but 3, 4 and 5 don't (they are samples taken from adjacent pixels).

Another problem is that the samples aren't evenly distributed. With one depth sample at the center and one at the top left corner, your average position is halfway between them. But your colour sample is based on the center.
On GF4 they corrected that by having the samples evenly distributed around the center of the pixel, as it should be.

Originally posted by: BFG10K
Not to mention that those samples aren?t even texture samples, they?re geometry samples.

They're geometry samples yes, but that's the problem... They aren't decoupled.

Originally posted by: BFG10K
Except it does, because it does MSAA.

It samples at the first z sample position. It's ONLY at the pixel center if your sampling pattern is organized that way.
Why isn't it decoupled if GF3 supports "MSAA as we know it today"?

Originally posted by: BFG10K
No you haven?t. You?ve made up utter bullshit that demonstrates an almost non-existant understanding of MSAA, along with the inability to grasp basic text and pictures.

Oh please. I've proven that I know perfectly well how MSAA works in OpenGL and Direct3D. My understanding of MSAA was never in doubt and you know it. You're just pissed that I caught you out on your mistakes earlier in the thread, so now you clamp on to this non-issue.
The problem is that you can't answer some simple questions about the mysteries of GF3/4.

Originally posted by: BFG10K
Yes they are given there?s only one texture sample, but multiple and unique z values. Again that?s what MSAA is.

Decoupled from the POSITION obviously, Mr. self-proclaimed MSAA-expert.
That's the point. With MSAA you ALWAYS sample in the pixel-center, regardless of the sampling pattern of your supersampled depthbuffer.
Obviously patterns with a sample in the center are generally sub-optimal. Which nVidia realized and improved with the GeForce4.
But why did they need to fix that in the GeForce4 if according to you GeForce3 already did full MSAA as we know it, meaning full freedom in the sampling pattern?

evolucion8 · May 22, 2009

im confused. when u guys talked about pixel shading for color sampling. is the pixel shader unit who does that job? i thought that it was performed by a fixed function thing like the ROPs. GF3/GF4 had very limited pixel shading functionality compared to Radeon 8500

Scali · May 22, 2009

Originally posted by: evolucion8
im confused. when u guys talked about pixel shading for color sampling. is the pixel shader unit who does that job? i thought that it was performed by a fixed function thing like the ROPs. GF3/GF4 had very limited pixel shading functionality compared to Radeon 8500

Limited or not, it had a pixel shading pipeline.
I'm not sure about the exact implementation of GF3/4, but it appears that most of the instructions in the ps1.1/ps1.3 standard are just generalizations of the fixedfunction shading options.
So likely it uses the pixelshader pipeline to perform all texturing and shading, regardless of whether the programmer writes an actual shader program or not (in the case of fixedfunction code, the driver will just use standard shader programs).
I know for a fact that this is how videocards have done this since the R300, but not entirely sure if it also goes for GF3/4 and Radeon 8500. It seems more likely than not though (else you just have a lot of duplicate hardware).

"Inevitable Bleak Outcome for nVidia's Cuda + Physx Strategy"

Lifer

Diamond Member

Banned

Banned

Diamond Member

Banned

Diamond Member

Banned

Lifer

Banned

Lifer

Lifer

Banned

Lifer

Diamond Member

Banned

Banned

Lifer

Lifer

Banned

Lifer

Diamond Member

Banned

Platinum Member

Banned