Doom III's "Texture per Pass", explain why 4x2=4, and 2x3=3?

Peter007 · Jun 14, 2002

According to John Carmack, Doom III is 6 multi-texture layer games.
Which is why he choose ATI Radeon 8500 over Geforce 4600 as his E3 Demostration graphic card.

Soon after, many On-line Reviewer start to published Graphic Card's "Texture per Pass" ability; such as Digit Life on Xabe400
.....................................................Xabre.........Radeon 7500.........MX460
Pipelines..........................................4..................... 2................................2
Texture units per pipeline................2..................... 3................................2
Textures per pass...........................4......................3................................2

This makes NO SENSE: ATI Radeon 7500 has 2 Rendering Pipeline, each pipeline is capable of processing 3 texture each. So In Theory, Radeon 7500 should be able to process 6 texture at once (2pipeline x 3 texture/pipleine = 6 texture)

Why is it only 3 texture????

Peter007 · Jun 14, 2002

Goi · Jun 15, 2002

That's what I'm confused about, and wanna know as well. Several sites report the same figures...1-2 years ago, it was a simple multiplication of # of graphics pipelines with texturing units per pipeline for the number of textures per pass. Now apparently the math is different. I suspect the introduction of pixel and vertex shaders have something to do with this...

kami · Jun 15, 2002

Which is why he choose ATI Radeon 8500 over Geforce 4600 as his E3 Demostration graphic card.

Wrongo...they used a Radeon R300 (next gen ATI card) to run the Doom3 demo at E3. It's a generation ahead of the GF4, and that's why it won. Carmack still says the best card you can buy today for Doom3 is a GF4 Ti (he said this in some interviews around the time of E3).

Also, here's an interesting read from Carmack's .plan update from a while ago:

Last month I wrote the Radeon 8500 support for Doom. The bottom line is that
it will be a fine card for the game, but the details are sort of interesting.

I had a pre-production board before Siggraph last year, and we were discussing
the possibility of letting ATI show a Doom demo behind closed doors on it. We
were all very busy at the time, but I took a shot at bringing up support over
a weekend. I hadn't coded any of the support for the custom ATI extensions
yet, but I ran the game using only standard OpenGL calls (this is not a
supported path, because without bump mapping everything looks horrible) to see
how it would do. It didn't even draw the console correctly, because they had
driver bugs with texGen. I thought the odds were very long against having all
the new, untested extensions working properly, so I pushed off working on it
until they had revved the drivers a few more times.

My judgment was colored by the experience of bringing up Doom on the original
Radeon card a year earlier, which involved chasing a lot of driver bugs. Note
that ATI was very responsive, working closely with me on it, and we were able
to get everything resolved, but I still had no expectation that things would
work correctly the first time.

Nvidia's OpenGL drivers are my "gold standard", and it has been quite a while
since I have had to report a problem to them, and even their brand new
extensions work as documented the first time I try them. When I have a
problem on an Nvidia, I assume that it is my fault. With anyone else's
drivers, I assume it is their fault. This has turned out correct almost all
the time. I have heard more anecdotal reports of instability on some systems
with Nivida drivers recently, but I track stability separately from
correctness, because it can be influenced by so many outside factors.

ATI had been patiently pestering me about support for a few months, so last
month I finally took another stab at it. The standard OpenGL path worked
flawlessly, so I set about taking advantage of all the 8500 specific features.
As expected, I did run into more driver bugs, but ATI got me fixes rapidly,
and we soon had everything working properly. It is interesting to contrast
the Nvidia and ATI functionality:

The vertex program extensions provide almost the same functionality. The ATI
hardware is a little bit more capable, but not in any way that I care about.
The ATI extension interface is massively more painful to use than the text
parsing interface from nvidia. On the plus side, the ATI vertex programs are
invariant with the normal OpenGL vertex processing, which allowed me to reuse
a bunch of code. The Nvidia vertex programs can't be used in multipass
algorithms with standard OpenGL passes, because they generate tiny differences
in depth values, forcing you to implement EVERYTHING with vertex programs.
Nvidia is planning on making this optional in the future, at a slight speed
cost.

I have mixed feelings about the vertex object / vertex array range extensions.
ATI's extension seems more "right" in that it automatically handles
synchronization by default, and could be implemented as a wire protocol, but
there are advantages to the VAR extension being simply a hint. It is easy to
have a VAR program just fall back to normal virtual memory by not setting the
hint and using malloc, but ATI's extension requires different function calls
for using vertex objects and normal vertex arrays.

The fragment level processing is clearly way better on the 8500 than on the
Nvidia products, including the latest GF4. You have six individual textures,
but you can access the textures twice, giving up to eleven possible texture
accesses in a single pass, and the dependent texture operation is much more
sensible. This wound up being a perfect fit for Doom, because the standard
path could be implemented with six unique textures, but required one texture
(a normalization cube map) to be accessed twice. The vast majority of Doom
light / surface interaction rendering will be a single pass on the 8500, in
contrast to two or three passes, depending on the number of color components
in a light, for GF3/GF4 (*note GF4 bitching later on).

Initial performance testing was interesting. I set up three extreme cases to
exercise different characteristics:

A test of the non-textured stencil shadow speed showed a GF3 about 20% faster
than the 8500. I believe that Nvidia has a slightly higher performance memory
architecture.

A test of light interaction speed initially had the 8500 significantly slower
than the GF3, which was shocking due to the difference in pass count. ATI
identified some driver issues, and the speed came around so that the 8500 was
faster in all combinations of texture attributes, in some cases 30+% more.
This was about what I expected, given the large savings in memory traffic by
doing everything in a single pass.

A high polygon count scene that was more representative of real game graphics
under heavy load gave a surprising result. I was expecting ATI to clobber
Nvidia here due to the much lower triangle count and MUCH lower state change
functional overhead from the single pass interaction rendering, but they came
out slower. ATI has identified an issue that is likely causing the unexpected
performance, but it may not be something that can be worked around on current
hardware.

I can set up scenes and parameters where either card can win, but I think that
current Nvidia cards are still a somewhat safer bet for consistent performance
and quality.

On the topic of current Nvidia cards:

Do not buy a GeForce4-MX for Doom.

Nvidia has really made a mess of the naming conventions here. I always
thought it was bad enough that GF2 was just a speed bumped GF1, while GF3 had
significant architectural improvements over GF2. I expected GF4 to be the
speed bumped GF3, but calling the NV17 GF4-MX really sucks.

GF4-MX will still run Doom properly, but it will be using the NV10 codepath
with only two texture units and no vertex shaders. A GF3 or 8500 will be
much better performers. The GF4-MX may still be the card of choice for many
people depending on pricing, especially considering that many games won't use
four textures and vertex programs, but damn, I wish they had named it
something else.

As usual, there will be better cards available from both Nvidia and ATI by the
time we ship the game.

8:50 pm addendum: Mark Kilgard at Nvidia said that the current drivers already
support the vertex program option to be invarint with the fixed function path,
and that it turned out to be one instruction FASTER, not slower.

Goi · Jun 15, 2002

Interesting bit of info, but still doesn't answer the question.

FalseChristian · Jun 15, 2002

Yes, yes, all very interesting. But I need answers not innuendo!

kami · Jun 15, 2002

I never claimed it did, i was just correcting something he said.

Goi · Jun 15, 2002

So, looks like either nobody around here knows the answer, or nobody around here cares...

SYST3M · Jun 15, 2002

well because there are 6 textures, the radeon 8500 is the only card out right now that can work on the same texture with different texture pipelines. so, if you have a gf4, which i think is 4 pipelines with 2 textures each, you can either work on 4 different pixels with 2 textures each, or 1 pixel with 4 texture. the same goes with the radeon 7500 and below as well as all the other geforce cards. Since the radeon 8500 has either 2 pipelines with 3 textures or vice versa, but since it has the ability to work on the same pixel in both pipelines, i think it has to do with register combines (but i am probably wrong), it can do all 6 textures in a single pass.

Goi · Jun 16, 2002

SYST3M, I'm sorry, but I don't quite get what you're saying, but a few things you were saying are indeed right. The GF3 and GF4 do indeed have 4 pixel pipelines, where each pipeline has the ability to apply 2 textures to the pixel. The Radeon OTOH has 2 pixel pipelines, where each pipeline has the ability to apply 3 textures to the pixel. So, what's Peter007 and I are assuming is that the Radeon can hence apply 3 different textures to 2 different pixels each to obtain 2 texels with 3 different textures, which makes 6 total different, unique textures applied. The GF3/GF4 OTOH can apply 2 different textures to 4 different pixels each to obtain 4 texels with 2 different textures, which makes 8 total different, unique textures applied.

Unless, of course, there is some limitation in the uniqueness of the textures being applied, that is, a limitation whereby the 3 textures applied to the 2 different pixels must be the same for the 2 different pixels in the Radeon's case, or that the 2 textures applied to the 4 different pixels must be the same for the GF3/GF4's case, then the numbers would make sense, but I see no reason why ATI and nvidia would impose such a limitation on their hardware. Also, even if the above limitation was indeed true, it still would not explain the Xabre's figure of 4 for the textures per pass, it would be 2 instead.

All this just adds to the confusion...

Peter007 · Jun 17, 2002

Kami Thanks for your Informative Input, however it didn't answer The Question I'm asking

I don't care to turn this into (1) rather it is RV300 or 8500 debate. That has been done in other thread already!

okay, for the sake of the argument, let say it was the upcoming RV300, it still doesn't answer my question regarding

the mysterious TEXTURE per PASS.

Please, someone got to know something??? Any game programmer care to give a guess???

Peter007 · Jun 19, 2002

up! I'm surprise I haven't found an answer yet

eaadams · Jun 19, 2002

Isn't it that the ATI can draw 6 textures in one pass while the NVIDIA can draw 8 in two

BenSkywalker · Jun 19, 2002

First off, forget Doom3 for a minute completely, I'll get to that in a bit but it will just confuse you if you don't have the basics down.

A texture *pass* is not necessarily one clock, or any fixed amount of clocks for that matter. With the Kyro series of boards as an example, there really is only an artificial limit to the number of textures that can be applied per *pass*, however it needs however many texture layers you are doing textures for in clock cycles. If you have ten texture layers you will need ten clock cycles to apply them.

The reason why the term *pass* is used is because exceeding the amount of texture layers that a board supports in one *pass* means that you need to rehandle all of the geometry for the scene again.

When you look at the pixel pipes for any card that means that they are each working on their own pixel. If you have a chip with one hundred pixel pipes it would be working on one hundred different pixels at once. The texture units per pipe indicate how many texture layers a board can apply in one clock cycle for each of its pixel pipes. For the Radeon 7500 as an example it can apply up to three different texture layers to two different pixels in one clock. That is where the 3 layers per pass comes from.

For the GeForce3/4 they use a *loopback* which allows them to refeed the same pixel down the pipe twice to apply double the amount of texture layers without having to rehandle the geometry. If you were to have say eight texture layers being applied then the GF4 would need two passes(handling geometry twice), R7500 would need three and the GF4MX would need four. Oversimplified a bit, the GF4MX would need four times the geometric power to stay even with the GF4 etc.

For the Kyro that I mentioned earlier, it doesn't work like any other board(outside of other PowerVR offerings) as it handles scene management in a completely different method so shouldn't be compared directly, however it is, in theory at least, possible for it to use a thousand texture layers per *pass*.

Doom3 is an entirely different beast to discuss and significantly more complicated.

Doom3 uses pixel shader effects heavily, and not all pixel shaders are created equally. Because of some of the differences in what is allowed between the GF3/4 and the R8500 the latter is capable of calculating out certain effects with less *layers/passes* then a GeForce3/4. Hard to think of a way to describe this simply.....

Say you wanted to take Color 1 and Color 2 and combine them and then add half the value of Color 3. On the GF3/4 you may have to go

C1 + C2 = X
C3/2= Y
X+Y= Final color

On the R8500 you may be capable of-

(C1+C2)+(C3/2)=Final Ouput color

So the R8500 may be able to do a particular effect in a single pass that it would take the GF3/4 multiple passes to handle, despite the GeForce4 being able to handle more traditional texture layers in a single pass. Because of this, it was expected that the R8500 would likely be considerably faster then the GF4 but that didn't end up being the case(the GF4 actually edged it out). The R8500 does need less passes then a GeForce4 to handle Doom3, it doesn't have to do with the amount of texture layers per pass supported however. Moving forward expect the flexibility of the pixe pipeline to become increasingly important when looking at the amount of "textures/layers/passes".

All those shaders nVidia started hyping last year really do offer loads of visual enhancements when utilized properly, but they make analyzing how a given chip will handle a particular game using those considerably more complex.

merlocka · Jun 19, 2002

Thanks for the explanation Ben. Where have you been? Haven't seen your posts in video for a while. Any thoughs on Parhelia?

Peter007 · Jun 19, 2002

BenSkywalker...............................Yor're the MAN!

Awesome, finally Someone with the Answer

I'm gonna read it over a couple more time before I fully understand it completely. But this was EXACTLY the kind of Answer I'm looking for

BenSkywalker · Jun 19, 2002

merlocka

Where have you been?

In terms of forum participation most of my time has been spent over at Beyond3D lately, although I've cut way back on the amount of time I spend at discussion forums at all. I've been working a lot more on reviews and such for the Basement, which requires that I spend a lot more time playing games too(not easy with the kids, wife and job also taking up time

).

Any thoughs on Parhelia?

I think people's reaction are going to depend a lot on what they expect from it. It isn't going to be an insanely fast card compared to the R8500 or GeForce4 under normal operation, although it should be considerably faster when you add high tap anisotropic and their form of anti aliasing. The new features in the card, particularly displacement mapping, sound promising. Unlike NPatches or other forms of HOS displacement mapping gives developers a good amount of control to manipulate exactly what will be tesselated out at raster time, although with Matrox's miniscule market penetration the amount of support is likely to remain very low until nVidia adds support to their hardware(and even then it will take some time before it is commonplace). The type of anti aliasing they are using on the board isn't 100% compatible with all games, that could be a problem if they don't offer a MSAA like substitute(which it should scream at given its bandwith). Given the level of bandwith and raw fill this board has, 1600x1200x32 should be absolutely no problem whatsoever so the AA performance and high tap anisotropic numbers will be the most important and the ones that people should be focusing on, not the standard tests that most sites are likely to be focusing on.

The vertex throughput the card is capable of sounds extremely promising, twice as fast as a GeForce4 clock for clock would make it a T&L beast. My biggest hesitation about the board, and this one will likely be a large factor on if I purchase one or not, is the level of OpenGL support in their drivers. If anyone still recalls the last competitive part Matrox had in the 3D space when new, the G400, was actually the best hardware available when new however the board was plauged with serious OpenGL problems not to mention its extremely limited availability. Another issue is that the pixel shaders are a bit behind the curve compared to what you would expect a late Q2 '02 board to have. Based on everything I have seen, this isn't going to be an issue in terms of what you can do in the near future however(it will simply take more passes), so that likely won't end up being an issue in terms of end user experience as the massive amount of raw T&L power combined with the incredible bandwith should give it enough headroom to remain easily competitive with, if not a decent amount faster then, any available competition.

With all of that said, we need to see street prices lower then what they are talking about, by a decent amount. Matrox isn't nVidia, they haven't been dominating the 3D performance market for many years now. The casual consumer isn't going to want to spend $400 on a Matrox part and they word of mouth advantage won't really kick in until around the launch of the R300/NV30. If this is the first of what Matrox plans on being a regular high end 3D part then it is an excellent way to get started. If they are expecting this board to dominate without a built up reputation in the market before moving ahead then we will almost assuredly see Matrox 'vanish' from the 3D market again. They are not going to have a major commercial success with this board or gain a significant amount of marketshare unless the street price ends up quite a bit lower then what they are talking about now. Sure, they will get some of us, the enthusiast camp, to move over to a Matrox part because we know who they are and we will know enough about the hardware. But much like it took nVidia a long time to make headway into the retail 3D market, it will take Matrox quite a bit of reputation building to make big strides there. As an OEM option, nVidia has the bulletproof driver reputation and even then their high end 3D parts are nearly non existant in terms of quantity. I hope Matrox has realistic expectations and will stay in the market. The Parhelia is a very promising piece of hardware with a lot of potential, and adding a third major player to the consumer 3D market will serve to benefit us considerably.

BenSkywalker · Jun 19, 2002

Pete-

No problem, I can't edit my posts without logging in so I want to mention real quick that I made a mistake in my post, it would take the GF4MX twice the geometric throughput of the GF4 Ti, and I had a typo where I left the l off of pixel. I was tired when posting that, my apologies. Does anyone think it is worth adding a bit more in depth explenation on passes and how pixel shaders impact them to the AnandTech FAQ? If it's worth it then I'll get it done and send it over to Andy.

onelin · Jun 19, 2002

I find it quite interesting

if others do, then I'd say go for it. This thread offered a deeper understanding of how these new videocards work...but I'd love to learn more. thanks for the good posts!

Goi · Jun 20, 2002

Thanks for clearing that up Ben...I had to scour many websites and reviews before I came up with the answer that you provided in that single post.

Peter007 · Jun 21, 2002

By all mean, please do Jedi Ben!

Enlight us with your Wisdom. If you have a Website with graphical review it would be even better.

Too bad most Graphic Card Maker don't post their "true" texture per pass ability.

I have 1 question. Kyro II can perform, in theory, like 8 texture at once. BUT in Reality it can do what?

Ben if you have to take a "guess Ranking" on current card based on What we know about DoomIII so far, how would you rank each graphic card?

BenSkywalker · Jun 21, 2002

I have 1 question. Kyro II can perform, in theory, like 8 texture at once. BUT in Reality it can do what?

Well, it can do eight textures in one pass, but it has a single texture unit for both of its pixel pipes so one texture layer is being applied to two different pixels per clock. With the Kyro2 it is using tile based rendering so you can't directly compare it in terms of raw fill with an immediate mode rendered(everything besides the Kyro2 currently on the market). The Kyro2 doesn't draw what you don't see so discussing fillrate issues with it are quite a bit more complex then comparing two different IMRs. This covers the board fairly in depth.

Ben if you have to take a "guess Ranking" on current card based on What we know about DoomIII so far, how would you rank each graphic card?

Currently available? Take Carmack's word as pretty much gold on that subject, the only thing that may make what he is saying less then 100% accurate are driver revisions changing things considerably before the game comes out or if he himself implements some changes in the rendering engine that has a performance impact. I couldn't carry Carmack's jock strap in general 3D knowledge, he's way out of my leauge, nevermind his own game

For upcoming cards I haven't seen enough on the R300 or NV30 to make a comment yet either way. Doom3 will rely a lot more on pixel shader throughput then anything else based on everything I have seen, the poly counts are actually very low(although the lighting is very complex). Is there a particular board that you are wondering about that is upcoming? If you were around when I was previously, I don't have the kind of access to information that I did previously on unreleased products so I can't say too much with accuracy one way or the other.

If you have a Website with graphical review it would be even better.

Manufacturers don't send my site review samples unfortunately, so I only review what I buy or what I have available for extended testing. I write for GameBasement and the sites primary focus, as is mine, is gaming. So if I can buy sixteen games for $800 or two graphics cards then I have to go with the former. If you can get nVidia/ATI/Matrox/PowerVR to send me out review samples I'd be more then happy to do them up though

Doom III's "Texture per Pass", explain why 4x2=4, and 2x3=3?

Platinum Member

Platinum Member

Diamond Member

Lifer

Diamond Member

Diamond Member

Lifer

Diamond Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Senior member

Diamond Member

Platinum Member

Platinum Member

Diamond Member

Diamond Member

Senior member

Diamond Member

Platinum Member

Diamond Member