"Inevitable Bleak Outcome for nVidia's Cuda + Physx Strategy"

Scali · May 20, 2009

Originally posted by: SickBeast
CUDA is already on the Ion platform, because they have lousy CPUs and need all the help they can get.

Yea, but even nVidia doesn't really know what it's good for, outside of video encoding:
http://www.nvidia.com/object/sff_ion.html

"NVIDIA® CUDA? technology to accelerate the most demanding applications, vastly improving your PC?s ability to work with visual content such as video encoding"

SickBeast · May 20, 2009

Originally posted by: Scali

Originally posted by: SickBeast
CUDA is already on the Ion platform, because they have lousy CPUs and need all the help they can get.

Click to expand...

Yea, but even nVidia doesn't really know what it's good for, outside of video encoding:
http://www.nvidia.com/object/sff_ion.html

"NVIDIA® CUDA? technology to accelerate the most demanding applications, vastly improving your PC?s ability to work with visual content such as video encoding"

Video encoding is the only consumer-level software that uses CUDA so far, but there are tons of scientific and game development applications for it.

Scali · May 20, 2009

Originally posted by: SickBeast
Video encoding is the only consumer-level software that uses CUDA so far, but there are tons of scientific and game development applications for it.

Yea I know, but are you going to run those on a netbook or something?
Games probably won't run anyway, because Cuda is no substitute for a CPU, and the Atom simply isn't fast enough, even if you'd strap a GTX295 to it.
And scientific stuff... you want to do that at a desk I assume... and with more powerful hardware.

Let alone the "Dick Tracy watch" that was mentioned.
What are we going to do, encode videos with our watch all day long?

As I say, to me it seems like a solution without a problem. Not quite the technology that's going to boost handheld devices, like Nemesis believes.

SickBeast · May 20, 2009

Originally posted by: Scali

Originally posted by: SickBeast
Video encoding is the only consumer-level software that uses CUDA so far, but there are tons of scientific and game development applications for it.

Click to expand...

Yea I know, but are you going to run those on a netbook or something?
Games probably won't run anyway, because Cuda is no substitute for a CPU, and the Atom simply isn't fast enough, even if you'd strap a GTX295 to it.
And scientific stuff... you want to do that at a desk I assume... and with more powerful hardware.

Let alone the "Dick Tracy watch" that was mentioned.
What are we going to do, encode videos with our watch all day long?

As I say, to me it seems like a solution without a problem. Not quite the technology that's going to boost handheld devices, like Nemesis believes.

Well, my 8800GTS 320mb can encode a 2h movie in under 30 minutes, so I figure that a GTX295 could do it in 10.

The 8600M that's strapped to the current Ion platform could probably do a 2h movie in a few hours, rather than the 20+ hours it would take the Atom to do it. It's useful for certain people IMO, and its usefulness will only increase over time.

The Ion is actually a pretty balanced gaming platform and would dust most notebooks in terms of gaming performance.

Scali · May 20, 2009

Originally posted by: SickBeast
Well, my 8800GTS 320mb can encode a 2h movie in under 30 minutes, so I figure that a GTX295 could do it in 10.

The 8600M that's strapped to the current Ion platform could probably do a 2h movie in a few hours, rather than the 20+ hours it would take the Atom to do it. It's useful for certain people IMO, and its usefulness will only increase over time.

I wasn't talking about video encoding.
Video encoding is a good case for GPGPU. We all know that. But I don't really see that with handheld devices. Generally you want to encode the video BEFORE you put it on a small device, because they have limited storage, not to mention limited battery life. You don't want to spend an hour encoding a video just so you can watch it, only to find that the batteries are almost dead anyway.

But other than that, what good is GPGPU going to be? For most users it's even rather pointless on the desktop. It's not going to do anything for your email, your web browsing, your office suite, your IM, etc.

I think handhelds may benefit more from accelerated graphics and video playback. But you don't need OpenCL/Cuda for that.

Idontcare · May 20, 2009

Originally posted by: Scali
What annoys me is that instead of answering my simple question, he rants on like a lunatic about nVidia, Tegra and whatever.
Who cares? PowerVR or Tegra or whatever other chip you can put into a handheld device and run OpenCL on... WHAT are you going to run on it? I don't see it!
As far as I can see it, OpenCL on a handheld is a solution without a problem.

I feel your frustration, see you are trying to employ linear rationalization to an otherwise nebulous and fuzzy stream of thought.

I'm not saying he's like a child, but I say that because I will say I didn't "get" how to interact with him until my daughter became 4yr old and I struggled to figure out how to "play" with her in her world. She is incredibly imaginative, and she sees the world in a way that my linear way of processing just fails to generate.

Once I figured out how to have fun interacting with an individual who sees the world in a very non-linear (from my POV) system of interactions like my daughter does; then I was finally able to engage, and enjoy, the views he takes the time to share with us.

Again I'm not saying Nemesis is anything like a child, I am saying he is incredibly imaginative and the stream of thought that comes out of that does seem, exactly as you say, like a lunatic ranting about "whatever" at times.

If you find it raising your blood-pressure then you are wasting your time, don't do that to yourself. Just let it go, it won't offend him from what I've gathered. But posting and trying to "cage" him in a web of logic is futile as well, your powers hold no authority there. :laugh:

Imagine trying to have a conversation with someone like Van Gogh, only he's interested in technology but has some really intriguing interpretations of the industry. Frustrating? Sure if you insist on him talking about tech like your coworkers do. Can it be fun? Yes, it stretches the mind. Can you learn something from it? Every journey offers an opportunity to learn something, at the bare minimum. What have you got to lose?

Scali · May 20, 2009

Originally posted by: Idontcare
I feel your frustration, see you are trying to employ linear rationalization to an otherwise nebulous and fuzzy stream of thought.

Well I've already given that up.
I tend to be more linear and rational than most people anyway.
I suppose someone like Nemesis and I just don't mix.

Nemesis 1 · May 20, 2009

Originally posted by: SickBeast

Originally posted by: Scali

Originally posted by: Nemesis 1
I want my dam dick tracy watch . Open CL will bring that vision.

Click to expand...

How?
You say OpenCL is going to be big on these small devices, I just asked why.

Click to expand...

CUDA is already on the Ion platform, because they have lousy CPUs and need all the help they can get.

I wouldn't said it that way . But ya. Scali what do ya want from me. Havok using physics for one. I didn't want to mention . I don't like debating this stuff, But when people try to close out other techs claiming they invented man . I got to step up . I really like Tech . If man has any chance of escaping prediction. Its threw tech it will happen . I like Idea of differant future than hellfire and brimstone.

Idontcare. thanks, for your humble attempts of trying to extricate my infirm attempts at were I see tech going. Your right I had a good run going . But I have come to derailment.

NO 3D gates @32nm . I don't see an epic /vliw backend coming from Intel thats been talked about in public. So without those my whole future Idea is none existant. So it was only short run . Remember I was basing everthing off of Elbrus compiler and later work done by intel . Now Haswell gives me hopes but not likely. Larrabbee really has me twisted theres just something about it I am missing . I understand that larrabee takes longer code breaks it up into smaller chunks . But it can still do long code. That means to me . I may be missing something here . But If Larrabbee can do long code the backend cannot be cisc. So I just confused.

Does larrabee have 2 backends? One for the vecture unit one for x86. Or is The recompile of x86 running on Vector unit change X86 code to vector code read by backend .

SickBeast · May 20, 2009

The key to Larrabee's success will be the way in which DX11 is coded. If it does in fact give Larrabee an advantage by making the shader code x86-based, then it has a chance to dethrone NV and AMD and could be a top performer. If MS goes with the status quo (which is the far more likely scenario) then the Larrabee is going to have a very hard time and will be at a 50% disadvantage per transistor as others have pointed out.

Idontcare · May 20, 2009

Originally posted by: Nemesis 1
Larrabbee really has me twisted theres just something about it I am missing

This is Goto-san's concept:
http://pc.watch.impress.co.jp/...html/kaigai02.jpg.html

Of course come the 16nm a larrabee core will be mighty tiny...so putting a few of them on die with your cpu won't exactly be a xtor budget killer either.

Nemesis 1 · May 20, 2009

Why can't they make effecient for everyone? This is part I don't like but at same time I love it. If MS slips up and doesn't support intel . Than MS runs the risk of Intel /Apple doing some headbanging . Its interesting times .

SickBeast · May 20, 2009

I think that the Larrabee will have some uses, but it will not be a very good gaming GPU IMO. It will probably be great for web servers, video editors, and people who create 3D graphics, but for the rest of us, it will suck compared with what NV/AMD have.

Nemesis 1 · May 21, 2009

I think maybe your right. Except for Intel games. You know games produced for the 80% market leader. Now I am not saying . That there are SUCK ups running the gaming community. But I can see a few trying to butter Intels bread. Can't. You. I mean intels a monopoly for cring out loud . The Game industryis going to ignor Mr. Big Bucks.

You could be right . Me turns around looks at man says no . Be lots of ass kissers.

SickBeast · May 21, 2009

Unless the Larrabee is in the PS4 then I think any developer that makes something for it at this point is nuts.

Intel would have to fund the game themselves which would be equally insane.

Nemesis 1 · May 21, 2009

Sorry sick beast but they are . The bought project offset .

Here is a demo running on larrabee.

Toward the end the rock that almost comes threw screen got my attension. Other effects also. Watch in HD

http://www.youtube.com/watch?v=QfDBp7lsA9M

thilanliyan · May 21, 2009

thilanliyan · May 21, 2009

Originally posted by: SickBeast
Well, my 8800GTS 320mb can encode a 2h movie in under 30 minutes, so I figure that a GTX295 could do it in 10.

Transcode you mean. AFAIK there aren't any apps yet for actual encoding, which is different from transcoding. Correct me if I'm wrong.

Scali · May 21, 2009

Originally posted by: SickBeast
The key to Larrabee's success will be the way in which DX11 is coded. If it does in fact give Larrabee an advantage by making the shader code x86-based, then it has a chance to dethrone NV and AMD and could be a top performer. If MS goes with the status quo (which is the far more likely scenario) then the Larrabee is going to have a very hard time and will be at a 50% disadvantage per transistor as others have pointed out.

DX11 is little more than DX10 with some extra features.
But I don't really see why Larrabee would be at such a disadvantage. It has very wide SIMD units, almost exactly like the ones that nVidia's GPUs have.
I think we can also assume that Intel has more advanced floating point technology, so Intel's SIMD units will likely be faster/smaller/more efficient than nVidia's.
And while it may not have any fixed-function hardware outside of texture samplers, it does have the classic x86 ALU pipelines and HyperThreading. So it has the ability to run conventional code alongside SIMD code. I can see that working for rasterizing and things. Since they also rasterize in a different way from how nVidia and ATi do it (tile-based), they should get more of an advantage from the caches.

I don't think Larrabee is going to have such a hard time, really. It probably won't be the fastest GPU on the market, but rather a decent midrange performer. Intel should be able to absorb a slight disadvantage in terms of transistorcount or such. After all, they are always ahead of the rest in manufacturing. And they're HUGE.

All that doesn't take any x86 into account yet. That would be 'phase 2', when Larrabee is on the market and people are buying it because it's a GPU with good price/performance. THEN you may see developers starting to use custom-made x86 code for Larrabee. But at first it will just be DX.

Nemesis 1 · May 21, 2009

. I agree with scali here. On DX larrabee probably slower but in its own programmed games . Project Offset should be a hybred of sorts made exclusively for intel. Its going to do fine. Ihave read 6 titles are coming out for larrabee . Must be recompiled old games. Noway 6 new game titles. So we wait see. Larrabee power Vr multi. gpu. Seem to have much in common.

Your right DX11 is more than DX10. Its a subset of DX10.1 to be exact. Its actually DX10 befor somebody cryed like a baby. and MS tamed it down .

masterobiwankenobi · May 21, 2009

Originally posted by: Nemesis 1
. I agree with scali here. On DX larrabee probably slower but in its own programmed games . Project Offset should be a hybred of sorts made exclusively for intel. Its going to do fine. Ihave read 6 titles are coming out for larrabee . Must be recompiled old games. Noway 6 new game titles. So we wait see. Larrabee power Vr multi. gpu. Seem to have much in common.

Your right DX11 is more than DX10. Its a subset of DX10.1 to be exact. Its actually DX10 befor somebody cryed like a baby. and MS tamed it down .

I will eat Your nuts if 6 titles will be coming out for larrabee this Year or next..:disgust::roll:

BenSkywalker · May 21, 2009

But If Larrabbee can do long code the backend cannot be cisc.

I've explained this to you before, but you don't seem to like most of what I have to say

Intel's processors do not execute any actual x86 code nore have they in a very long time. They have decode hardware on the front end that converts the native x86 code into micro ops which the processor then executes. This is on of the staggering drawbacks to Larrabee, but the only way possible to execute x86 code in an environment like they are shooting for.

DX11 is little more than DX10 with some extra features.

I would say that is an utterly horrific inaccuracy. I think you could say D3D11 is little more then D3D10 with some extra features, but CS alone changes the dynamics of DX overall a rather profound amount.

But I don't really see why Larrabee would be at such a disadvantage. It has very wide SIMD units, almost exactly like the ones that nVidia's GPUs have.

Larrabee isn't going up against current nV GPUs, they are going up against MIMD based nV GPUs so it isn't a very fair comparison.

And while it may not have any fixed-function hardware outside of texture samplers, it does have the classic x86 ALU pipelines and HyperThreading. So it has the ability to run conventional code alongside SIMD code. I can see that working for rasterizing and things.

How many instructions per core/clock can it retire? Roughing it out a Larrabee with 64 cores at 2GHZ worked out to be, at best, comparable to a nVidia 9500GT with 16x AF IIRC based on the specs Intel has stated. Between the decode hardware and the limited ability to retire enough instructions per clock there is no doubt Larrabee is going to have some rather staggering hurdles to compete even with mid range parts.

Since they also rasterize in a different way from how nVidia and ATi do it (tile-based), they should get more of an advantage from the caches.

What are they going to use, about 60-100MB of on die cache? They will either need to do that or have monsterours bandwidth, data passing from the tesselator to the 'bin' then back to the chip for surface viz check, very messy way of dealing with rendering under D3D11. TBRs problem has always been it doesn't deal with large amounts of geometry well, a problem rasterizers have no issue with(there's has always been dealing with overdraw but that has largely been 'solved').

Intel should be able to absorb a slight disadvantage in terms of transistorcount or such. After all, they are always ahead of the rest in manufacturing. And they're HUGE.

Without a pixel pipeline they are going to need to be significantly more powerful then nV or ATi to compete. Even as a TBR, PowerVR was never foolish enough to think of releasing a part without a pixel pipe, you are giving up WAY too much performance versus a rasterizer. Sure, if all games swapped over to RTR it would have merits, but RTR has very little in the way of clear cut advantages and most of those are easily offset by its' disadvantages.

I think maybe your right. Except for Intel games. You know games produced for the 80% market leader.

For the actual games market right now Intel as at best fourth. IBM, ATi and nVidia all are far more powerful in terms of the hardware people play games on. Intel doesn't dominate, they happen to have a slight performance edge in the one class of gaming that has been in a rather rapid state of decline, that is it. Intel may be a monster in the commodity CPU market, the are a marginal player in the gaming market at best.

Scali · May 21, 2009

Originally posted by: BenSkywalker
I would say that is an utterly horrific inaccuracy.

It is, if you pull it out of context.
My point was that DX11 doesn't really allow you to take advantage of Larrabee's x86-architecture (which is how I interpreted the statement I responded to).
Compute shaders may be new to DX11, but they are supported on current DX10 hardware, so there's no x86-benefit to it.
DX11 is not the DX that will allow you to run Larrabee-specific code for great justice.

Originally posted by: BenSkywalker
Larrabee isn't going up against current nV GPUs, they are going up against MIMD based nV GPUs so it isn't a very fair comparison.

Well, we'll have to see about that. nVidia's current midrange is based on the G92 chip, not on the GT200. If it stays like that, then when G300 comes out, (a 40 nm version of?) the GT200 may be the midrange part at the time Larrabee comes out.

Originally posted by: BenSkywalker
What are they going to use, about 60-100MB of on die cache? They will either need to do that or have monsterours bandwidth, data passing from the tesselator to the 'bin' then back to the chip for surface viz check, very messy way of dealing with rendering under D3D11.

You'll have to read the article that Michael Abrash wrote about the rasterizer recently. It will at least give you SOME insight in how the binning and rasterizing is done.

My remark however was not related to binning, since I think they got that 'solved' nicely, judging by what Abrash wrote. My remark was related to how a tile will fit in the cache, giving you really good z/stencil and blending performance.

Originally posted by: BenSkywalker
TBRs problem has always been it doesn't deal with large amounts of geometry well

As far as I know, there's never been a TBR with specific hardware to handle geometry.
All TBR devices I know of, were limited to CPU T&L, and as such the CPU also had to take care of most of the sorting/binning tasks.
Larrabee will do all this on-chip, with a parallel approach, so it scales much better to large amounts of geometry. Again, I'll have to refer to Michael Abrash's article for more details.

Originally posted by: BenSkywalker
Without a pixel pipeline they are going to need to be significantly more powerful then nV or ATi to compete.

What exactly do you mean with a 'pixel pipeline' in this case, and what is it that makes it so important to you?
The Abrash article gave a nice explanation of how they prepare the masks for the SIMD units so they can render 4x4 blocks of pixels much like how nVidia and ATi do it.
So I don't really see a big difference there.
I think the main question will be: is Intel's SIMD solution going to be as good or better than the nVidia and ATi units?
I really don't think it's going to be bottlenecked by rasterization much. This is ofcourse assuming the actual pixelshading is nontrivial, so some of the latency/overhead of Intel's approach can be hidden. But with a shader-heavy game like Crysis, that should be no problem.

BenSkywalker · May 21, 2009

My point was that DX11 doesn't really allow you to take advantage of Larrabee's x86-architecture (which is how I interpreted the statement I responded to).
Compute shaders may be new to DX11, but they are supported on current DX10 hardware, so there's no x86-benefit to it.
DX11 is not the DX that will allow you to run Larrabee-specific code for great justice.

What is missing in DX11 that would allow more from Larrabee? I'm not seeing the limitations here, in fact some of it seems to go beyond what Larry would even handle comfortably(OoO I/O).

Well, we'll have to see about that. nVidia's current midrange is based on the G92 chip, not on the GT200. If it stays like that, then when G300 comes out, (a 40 nm version of?) the GT200 may be the midrange part at the time Larrabee comes out.

The GTX 260 is $150, what do you consider mid range exactly? Normally the $200 price point has been mid range, GTX260 is pushing down into the low end market segment.

You'll have to read the article that Michael Abrash wrote about the rasterizer recently. It will at least give you SOME insight in how the binning and rasterizing is done.

Binning is done no differently as far as geometry is concerned based on what Abrash wrote, actually it appears there setup is considerably less efficient then what PowerVR has done in the past, but part of that is necessity due to the lack of rasterization support. In terms of actual memory utilization and bandwidth requirements Larrabee will simply make sure that the maximum bin space possible is always required. It is really a flat out horrible design for a GPU. Really, where are you keeping all of the geometric data for visibility checks on all of your blocks/chunks?

What he doesn't get into, and where I have always said Larry was going to run into problems, was in applying AF. You can not ship a part today without AF and expect to even compete with nV/ATi's integrated solutions. Without a pixel pipe, Larry is going to get killed here. PowerVR had a very similar problem with more advanced filtering. Shear raw IPS is what you need, and Larry falls shockingly short.

My remark however was not related to binning, since I think they got that 'solved' nicely, judging by what Abrash wrote. My remark was related to how a tile will fit in the cache, giving you really good z/stencil and blending performance.

You will have massive thrashing for cross block/chunk polys unless you have huge cache on die.

Larrabee will do all this on-chip, with a parallel approach, so it scales much better to large amounts of geometry.

Either it needs to have massive amount of on die cache or it will need to transform all geometric data(after tesselation which is where things will get very expensive) and write it back to board memory and then bounce around to handle checks. The tiles themselves have never been the problem, you can't check the visibility of a poly without knowing where the other polys in the scene are going to be. All of the geometric data must be binned.

What exactly do you mean with a 'pixel pipeline' in this case, and what is it that makes it so important to you?

In particular the blending ops post texel sample phase. That is where Larry is going to roll over and die without pipelines which it doesn't have. It can not handle that level of execution with their architecture and be remotely competitive.

The Abrash article gave a nice explanation of how they prepare the masks for the SIMD units so they can render 4x4 blocks of pixels much like how nVidia and ATi do it.

Getting the basic scene on screen isn't where I have ever seen a problem. Making it look tollerable is where I see problems.

But with a shader-heavy game like Crysis, that should be no problem.

With 16x AF I would wager very heavily Larrabee will be utterly destroyed by current hardware, it will not be remotely close. Honestly, based on Abrash's article it would seem that unlike PVR's setup, they will also pay a decent performance penalty for MSAA too(as each chunk will require 4x the setup work for 4x AA, PVR it was 'free').

Scali · May 21, 2009

Originally posted by: BenSkywalker
What is missing in DX11 that would allow more from Larrabee? I'm not seeing the limitations here, in fact some of it seems to go beyond what Larry would even handle comfortably(OoO I/O).

I think the limitations are fairly obvious...
DX11 compute shaders only allow you to run GPGPU-like code. Larrabee has x86 cores which support more than what a regular GPGPU does. The branching and threading that an x86 core supports are way different from how they are handled with GPGPU code.
Heck, I think the way Larrabee works already points out what I mean:
The DX11 driver on Larrabee is just a piece of software!
You could change that software to make Larrabee do other things, but not within the constraints of DX11.

I think you see Larrabee just as a DX11 GPU, in which case you're obviously missing the point I'm driving at. A DX11 GPU is just one of the many things you can make Larrabee do.

Originally posted by: BenSkywalker
What he doesn't get into, and where I have always said Larry was going to run into problems, was in applying AF. You can not ship a part today without AF and expect to even compete with nV/ATi's integrated solutions.

What makes you think it doesn't have AF?
Intel has said that one of the few fixedfunction units they will add to Larrabee is texturing hardware. Now I would be surprised if they add texture units that DON'T perform AF. Even their IGPs can do AF quite well. That's the whole point of adding fixedfunction texture units: getting efficient texture fetches.

Originally posted by: BenSkywalker
The tiles themselves have never been the problem, you can't check the visibility of a poly without knowing where the other polys in the scene are going to be. All of the geometric data must be binned.

It's NOT an 'infinite planes' solution. Visibility will simply be done through a z-buffer. It's just that the z-buffer can run in cache.
Therefore you also don't need to do all this 'bouncing around' that you kept talking about.
I figured you somehow had the wrong idea about Larrabee, because some of the things you said didn't make sense to me. Couldn't quite put my finger on it, but I think this is it.

Originally posted by: BenSkywalker
In particular the blending ops post texel sample phase. That is where Larry is going to roll over and die without pipelines which it doesn't have. It can not handle that level of execution with their architecture and be remotely competitive.

You mean texture filtering? As I said above, there's hardware for that.

Originally posted by: BenSkywalker
Honestly, based on Abrash's article it would seem that unlike PVR's setup, they will also pay a decent performance penalty for MSAA too(as each chunk will require 4x the setup work for 4x AA, PVR it was 'free').

Have we read the same article?
MSAA works very elegantly because the rasterizer is 'iterative'. It doesn't go right down to pixel level, but starts on a larger grid and refines the solution at every step.
With a simple check it knows when all samples of a pixel are entirely inside the triangle, so it won't actually have to go down to the 4x sample level.

BenSkywalker · May 21, 2009

The branching and threading that an x86 core supports are way different from how they are handled with GPGPU code.

LRB is P54C, it is an in order core. It can run OoO about as well as current GPGPU solutions. As far as threading goes, how is it much different?

What makes you think it doesn't have AF?

They claimed they had texture sampling hardware, if they were to create a unified sampling unit pipelined to perform AF they would be infringing on nV's patents which they haven't licensed.

That's the whole point of adding fixedfunction texture units: getting efficient texture fetches.

It's the blends that will kill them, if they have the process set up in a pipeline, it is a patent issue. Their exising IGPs fell under a cross licensing agreement, since they have filed a suit against nV that license no longer applies.

Visibility will simply be done through a z-buffer. It's just that the z-buffer can run in cache.

The Z of one triangle can run in cache, if you don't bin the geometry first how are you going to determine what is the closest to camera Z value for a given poly? The are going to have to transform, defer, then handle viz checks.

Therefore you also don't need to do all this 'bouncing around' that you kept talking about.

Maybe, depending on how the scene ends up. Translucent or opaque triangles are going to create issues even in an ideal setup as far cache is concerned. With a tesselated scene pushing well into the millions of polys per frame, we are talking better then a poly per pixel, we are going to have multiple pixels that have large numbers of polys occupying the space. As geometric complexity increases, TBR gets more complex to handle no matter what approach you use. The actual visiblity check is simple in LRB, under ideal circumstances.

MSAA works very elegantly because the rasterizer is 'iterative'. It doesn't go right down to pixel level, but starts on a larger grid and refines the solution at every step.
With a simple check it knows when all samples of a pixel are entirely inside the triangle, so it won't actually have to go down to the 4x sample level.

In the precise example Abrash gave in his example with the poly that crossed into multiple grids if they failed to go through all of the iterations they would fail to apply any AA at all to at least one of the grids the poly falls into(because it wasn't far enough in to be read in the chunk on the adjacent grid). So I guess if you wanted to do a very cheap and ineffective AA you could do it with little pefromance penalty, but even given his own examples it would fall down very easily.

"Inevitable Bleak Outcome for nVidia's Cuda + Physx Strategy"

Banned

Lifer

Banned

Lifer

Banned

Elite Member

Banned

Lifer

Lifer

Elite Member

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Lifer

Banned

Lifer

Junior Member

Diamond Member

Banned

Diamond Member

Banned

Diamond Member