Discussion on AA Efficiency

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
With the release of the X1800's we have now seen AA performance that incurs very little performance hit. What is actually making this so good. Is it the ring bus? Or is it merely that ATI has a better AA method (Programmable)?

IN contrast why does Nvidia take such a big performance hit with AA, yet still have a lower IQ?

-Kevin
 

EightySix Four

Diamond Member
Jul 17, 2004
5,122
52
91
The ring bus gives so much more bandwidth to the gpu to memory, that it allows almost free AA. The XBOX360 is going to have 10mb on the die of the GPU for completely free AA
 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
So are there any disadvantages for using a ring bus. If not, what is preventing Nvidia from moving to a "ring style" bus, along with Programmable AA? It just seems like Nvidia while they are A+ on the releases and are toe to toe with ATI (until AA is enabled), it just seems that they are lagging behind a little bit now.

-Kevin
 
Jun 14, 2003
10,442
0
0
Originally posted by: crazySOB297
The ring bus gives so much more bandwidth to the gpu to memory, that it allows almost free AA. The XBOX360 is going to have 10mb on the die of the GPU for completely free AA

the memory is still only connected at 256bits though, i dont believe atis great AA performance is attributed to the ring bus, i think its simply more to do with the fact that their AA algorithm is programmable. while Nvidia still uses fixed sampling points.

the ring bus doesnt really give more bandwidth to the memory, i reckon its more bandwidth when shunting data round the gpu die.
 
Jun 14, 2003
10,442
0
0
Originally posted by: Gamingphreek
So are there any disadvantages for using a ring bus. If not, what is preventing Nvidia from moving to a "ring style" bus, along with Programmable AA? It just seems like Nvidia while they are A+ on the releases and are toe to toe with ATI (until AA is enabled), it just seems that they are lagging behind a little bit now.

-Kevin

lol i think Nv have been playing catch up with AA and AF ever since R300. whilst now ATI are playing catch up with actually making and releasing
 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
Oh...yes that would make sense. It is still a 256bit memory architecture but the bus for the GPU itself is 512bit. So that isn't it.

As far as programmable AA, why in the hell is Nvidia not using this. Are they stupid or something...not only does it seem to have better IQ, but it is a fraction of the performance hit. Are there any downsides to the algorithm?

-Kevin
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
It's because ATI's is implemented better. In ATI's Adaptive AA mode (MSAA+TSAA), there is the normal multisampling then it takes some of the textures and supersamples them. This is as opposed to supersampling the whole screen. I also believe ATI's algorithm is selective between textures whereas NV's is not, which is where the advantage comes in. The Radeon has always incurred less of a hit with AA enabled it seems, so I think it's more the algorithm than the hardware itself.

AT's article explains more than you'd ever want to know about this:
http://www.anandtech.com/video/showdoc.aspx?i=2552&p=6

Originally posted by: Gamingphreek
Oh...yes that would make sense. It is still a 256bit memory architecture but the bus for the GPU itself is 512bit. So that isn't it.

As far as programmable AA, why in the hell is Nvidia not using this. Are they stupid or something...not only does it seem to have better IQ, but it is a fraction of the performance hit. Are there any downsides to the algorithm?

-Kevin

Well, the G70 is already released. They can't just change it. NVIDIA's AA is not programmable. I hope we see it in their next-gen GPU though.
 
Jun 14, 2003
10,442
0
0
Originally posted by: xtknight
It's because ATI's is implemented better. In ATI's Adaptive AA mode (MSAA+TSAA), there is the normal multisampling then it takes all the textures and supersamples them. This is as opposed to supersampling the whole screen. I also believe ATI's algorithm is selective between textures whereas NV's is not, which is where the advantage comes in. The Radeon has always incurred less of a hit with AA enabled it seems, so I think it's more the algorithm than the hardware itself.

AT's article explains more than you'd ever want to know about this:
http://www.anandtech.com/video/showdoc.aspx?i=2552&p=6

Originally posted by: Gamingphreek
Oh...yes that would make sense. It is still a 256bit memory architecture but the bus for the GPU itself is 512bit. So that isn't it.

As far as programmable AA, why in the hell is Nvidia not using this. Are they stupid or something...not only does it seem to have better IQ, but it is a fraction of the performance hit. Are there any downsides to the algorithm?

-Kevin

Well, the G70 is already released. They can't just change it. I hope we see it in their next-gen GPU though.


i thought transparency aa was a mix of supersampling and multisampling also? or is does it do something different?
 

Gamingphreek

Lifer
Mar 31, 2003
11,679
0
81
Originally posted by: xtknight
It's because ATI's is implemented better. In ATI's Adaptive AA mode (MSAA+TSAA), there is the normal multisampling then it takes some the textures and supersamples them. This is as opposed to supersampling the whole screen. I also believe ATI's algorithm is selective between textures whereas NV's is not, which is where the advantage comes in. The Radeon has always incurred less of a hit with AA enabled it seems, so I think it's more the algorithm than the hardware itself.

AT's article explains more than you'd ever want to know about this:
http://www.anandtech.com/video/showdoc.aspx?i=2552&p=6

Originally posted by: Gamingphreek
Oh...yes that would make sense. It is still a 256bit memory architecture but the bus for the GPU itself is 512bit. So that isn't it.

As far as programmable AA, why in the hell is Nvidia not using this. Are they stupid or something...not only does it seem to have better IQ, but it is a fraction of the performance hit. Are there any downsides to the algorithm?

-Kevin

Well, the G70 is already released. They can't just change it. NVIDIA's AA is not programmable. I hope we see it in their next-gen GPU though.

Well yeah they cant change it, but are there any downsides to Fully Programmable. If not, i dont know what Nvidia was thinking with this old algorithm.

-Kevin
 

xtknight

Elite Member
Oct 15, 2004
12,974
0
71
Originally posted by: otispunkmeyer
i thought transparency aa was a mix of supersampling and multisampling also? or is does it do something different?

Multisampling won't do anything for textures. TxAA is just supersampling within alpha textures.

What you may mean is what is REALLY enabled, which probably is MSAA+TxAA, so they usually go together in the whole picture, yeah. But TxAA by itself is just supersampling within the alpha textures.

Originally posted by: Gamingphreek
Well yeah they cant change it, but are there any downsides to Fully Programmable. If not, i dont know what Nvidia was thinking with this old algorithm.

-Kevin

They got lazy. :shocked: This is why it's good to have competition.
 

Munky

Diamond Member
Feb 5, 2005
9,372
0
76
From what I've read about the x1k, the ring bus does help improve memory access. It's still a 256-bit but, but the latency is less, and it routes the data to where it's needed most first, so overall it probably does help with AA. Then you add 1500mhz mem, and that just makes it even better.
 

Drayvn

Golden Member
Jun 23, 2004
1,008
0
0
Originally posted by: munky
From what I've read about the x1k, the ring bus does help improve memory access. It's still a 256-bit but, but the latency is less, and it routes the data to where it's needed most first, so overall it probably does help with AA. Then you add 1500mhz mem, and that just makes it even better.

Munky is right about the ring bus. From what ive read and understand the Ring Bus actually lets the GPU get the required information far quicker than using 4x64 bit memory modules. And this is attributed to being 8x32bit memory modules also. Because each module is smaller there is less latency and can read it quicker to get the info. And because the Ring Bus directs the commands to each module quicker than having to check each module individually first. From i think what B3D said on their overview is that the flow of the requesting of data is far more efficient because of the memory controller in the middle can proritise and transfer the request to the right set of memory modules and then can pass this through the actually Ring Bus (which can transfer data both up and downstream at the same time maximum bandwidth) to the Ring Stop (there are four) which is closest to the client requester and that will transfer the data to the GPU. This is far more efficient than if you didnt have the Ring Bus as the go between.

Because the memory controller would have to search through each memory module to find the right info to send back to the GPU. Which can significantly increase latency times as it has to search depending on where the info is.